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A QUEST INTO THE 


HUMAN BRAIN. 


By Mattia Maroso_ 


ur brain is composed of 86 billion neurons and a similar number 
of non-neuronal cells. The National Institute of Health’s Brain Re- 
search through Advancing Innovative Neurotechnologies (BRAIN) 
Initiative - Cell Census Network (BICCN), which was first launched 
in 2017, is a consortium of centers distributed in the United States 
and Europe that work together toward the goal of characterizing 
cell types and their functions in the brains of humans, nonhuman 
primates (NHPs), and rodents. 

In Science, Science Advances, and Science Translational Medi- 
cine, we present the results of this monumental effort. By leveraging the 
most-advanced technologies, which, until now, were mostly applied to ani- 
mal models, the studies examine the cellular composition of the adult and 
developing human brain at the transcriptional, epigenetic, and functional 
levels. The studies are organized around five main themes: (i) adult hu- 
man single-cell atlasing, including studies using single-cell transcriptomic 
and epigenomic analyses to characterize the human brain; (11) adult NHP 
single-cell atlasing, which focuses on similar single-cell analyses in marmo- 
set and macaque brain; (ili) comparative single-cell analyses that compare 
cellular composition in human versus NHP brain; (iv) human and NHP 
brain development single-cell analyses that focus on the characterization 
of developmental dynamics in developing human and NHP brain; and (iv) 
human neuronal cell-type functional and anatomical analysis and model- 
ing, which includes the physiological and anatomical characterization of 
cellular properties in living human tissues, and the modeling of cell types 
and specialized cellular properties in humans versus rodent models. 

The data collected by the BICCN will now allow researchers to tackle fun- 
damental scientific questions about the human brain and its genetic orga- 
nization. The era of cellular human brain research is knocking at our door! 
10.1126/science.adl0913 
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BRAIN CELL CENSUS 


PERSPECTIVE 


A family portrait of human brain cells 


A cell census provides information on the source of human brain specialization 


By Alyssa Weninger’ and Paola Arlotta?# 


he brain is composed of multiple 
regions associated with distinct func- 
tions, which have become further 
specialized in the human lineage. 
To define how this specialization is 
implemented, how it arises during 
development, and how it has emerged over 
the course of human evolution, a detailed 
understanding of the cells that make up the 
human brain is required. The development 
of technology for high-throughput profil- 
ing of single cells within tissues, across both 
genomic and functional characteristics, now 
makes this a feasible goal. On pages 175, 176, 
178, 177, 174, 170, 179, 173, and 181 of this issue, 
Siletti et al. (1), Jorstad et al. (2), Chartrand et 
al. (3), Lee et al. (4), Tian et al. (5), Johansen 
et al. (6), Jorstad et al. (7), Velmeshev et al. 
(8), and Kim et al. (9), respectively, present 
comprehensive single-cell compendia of the 
brains of humans and nonhuman primates 
during development and in adulthood. 

This work emerges from the vision of the 
National Institutes of Health (NIH)’s Brain 
Research through Advancing Innovative 
Neurotechnologies (BRAIN) Initiative - Cell 
Census Network (BICCN), an ambitious ef- 
fort that brings together a large spectrum of 
laboratories across many disciplines to iden- 
tify, characterize, and map every cell type of 
the brain in humans, nonhuman primates, 
and mice. The findings provide new perspec- 
tives on the role played by cell type diversity 
in regional and species-specific differences 
across the human brain. These data can also 
inform the development and selection of 
models for different experimental questions, 
ranging from basic understanding to mecha- 
nisms of human disease. 

Siletti et al. applied single-cell RNA se- 
quencing to samples from nearly 100 ana- 
tomical locations in the adult human brain 
to demonstrate that, although some brain re- 
gions have distinct cell types, many regions 
differ primarily in the relative proportions of 
a shared group of cell types. This conclusion 
also emerged from Jorstad et al. (2), who pro- 
filed eight areas of the adult human cerebral 
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cortex to show that most areas contain the 
same major classes of cells and differ mainly 
in the proportions of each type and the cor- 
tical layer to which non-neuronal cells are 
localized. There were exceptions to this gen- 
eral rule—for example, the primary visual 
cortex contained area-specific inhibitory 
neuron types. Overall, the data support an 
evolutionary strategy for the diversification 
of regional function that largely does not de- 
pend on the generation of new cell types but 
rather uses small variation within cell types 
and changes in their relative distribution to 
create distinct circuitry. 

Although methods for high-throughput 
single-cell profiling are most developed for 


transcriptional assays, no one modality is suf- 
ficient to fully define cell identities. Chartrand 
et al., Lee et al., and Tian et al. applied combi- 
natorial analyses that profile individual cells 
across multiple modalities, such as transcrip- 
tome, epigenome, and functional assays, to 
reveal regional diversity that was not readily 
apparent from transcriptomic profiles alone. 
Not unexpectedly, besides the variation 
among brain regions, there is also variation 
between the brains of different individuals. 
There is no single prototypical human; a 
spectrum of differences in genetic variation ~ 
and environmental response exists both in 
healthy individuals and in disease states. 
Johansen et al. transcriptionally profiled neo- 


Single-cell mapping of the brain 


High-throughput single-cell profiling of the transcriptome, proteome, or epigenome can be used to assign cells 
into clusters on the basis of similar characteristics. Plots of these clusters can help improve understanding 

of the specialization of the human brain on several levels. Profiling of the adult human brain showed that 

most brain regions contain the same major classes of cells and differ mainly in the proportions of each type : 
(1). Comparison of the human brain with those of nonhuman primates provided information on how well 

these species model human physiology (2). Single-cell profiling of the brain during prenatal and postnatal 
development aided understanding of how brain specialization is achieved and can be compared with similar 

maps from stem cell-based organoids to improve in vitro models (3). 


1 Specialization of brain regions 


3 Specialization during development 


9 months 


2 Specialization of the human brain 


Cluster of similar 
brain cells 


Mature organoid 
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cortical cells from an array of 75 individual 
brains and found that only a minority of in- 
terindividual differences could be explained 
by general demographic indicators, such as 
age, sex, ancestry, or disease state. Notably, 
different classes of cells showed distinct de- 
grees of interindividual variability—for ex- 
ample, excitatory neurons displayed greater 
variation compared with interneurons. The 
source of this variation is as yet unclear, but 
the observation adds to a body of emerging 
evidence for high interindividual variability 
that points to the necessity of studying large 
cohorts of humans irrespective of the biologi- 
cal questions being asked. 

Among primates, humans have greatly ex- 
panded cognitive abilities, such as language 
and abstract thinking. Fundamental ques- 
tions remain regarding what fine cellular- 
and circuit-level differences endow human 
brains with these capacities and how these 
differences have evolved. Jorstad et al. (7) 
performed single-cell transcriptional profil- 
ing of the middle temporal gyrus—a region 
critical for language comprehension—across 
five primate species: human, chimpanzee, 
gorilla, rhesus macaque, and marmoset. They 
found that although primates largely share 
the same, conserved cell types, they show 
substantial differences in cell proportions. 
Notably, although in general glia are less di- 
verse than neurons within species, microglia, 
astrocytes, and oligodendrocytes showed 
greater transcriptional differences between 
species compared with neurons and had a 
faster evolutionary divergence in the tran- 
scriptome. Only a few hundred genes showed 
human-specific expression patterns, and 
these were disproportionately near genomic 
regions with signs of evolutionary selection 
in humans. These results suggest that the 
specific properties of the adult human cortex 
may derive from relatively few cellular and 
molecular changes. 

These types of highly resolved compara- 
tive analyses not only fuel mechanistic 
understanding of human brain evolution, 
a field that has traditionally been largely 
descriptive, but are also critical for under- 
standing how well different species can 
model human brain physiology, pathology, 
and therapeutic response. This understand- 
ing can begin to define the differences that 
may make a particular model species better 
suited for addressing certain experimental 
questions over others. 

The identified interspecies differences also 
highlight the need to complement the use of 
animal models with human model systems. In 
particular, stem cell-based in vitro systems, 
such as human brain organoids (10), have 
emerged as powerful experimental models to 
probe human brain biology. Organoids also 
have the potential to allow high-throughput 
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L4 neurons type 2 


Oligodendrocyte precursor cells 


Different types of neuronal and glial cells (indicated by color), defined using single-cell transcriptome profiling, 
are present in different locations within the human cerebral cortex. IN, interneuron; L, layer. 


modeling of cells and circuits across multiple 
humans to empower the study of interindi- 
vidual differences. Understanding how to 
build organoids, and whether they replicate 
features and properties of the native tis- 
sue, requires comprehensive datasets of the 
human brain not only in the adult but also 
across development. 

Toward this goal, Velmeshev e¢ al. per- 
formed single-cell transcriptional and epig- 
enomic profiling of different cortical areas 
and the ganglionic eminences (transient fetal 
brain structures that are the primary source 
of cortical interneurons) during human pre- 
natal and postnatal development (see the 
images). Notably, they were able to include 
difficult-to-obtain samples from late stages of 
gestation and early postnatal development, 
providing an essential look into processes 
occurring at key stages of infancy. Kim et 
al. present a deep single-cell transcriptomic 
investigation of the developing human thala- 
mus, which is critical for central processing 
of sensory information but has lacked a de- 
tailed investigation of cellular composition. 
The results of both studies highlight mo- 
lecular mechanisms driving cellular diver- 
sification in the human brain and inform 
strategies for the development of new in vitro 
systems to expand the compendium of brain 
regions that can be experimentally modeled 
(see the figure). 

Since the beautiful images of Santiago 
Ramon y Cajal provided a first glimpse into 


the immense diversity and complexity of 
cell types found in the brain, neuroscience 
has been challenged and inspired to under- 
stand how these diverse cells ultimately pro- 
duce the brain’s abilities. Today, the scope 
and depth of these studies from the BICCN 
demonstrate the potential of this type of 
large cooperative initiative to generate this 
fundamental knowledge of the human brain. 
Efforts of this type enable progress toward 
the collective goal of profiling the entirety 
of the brain, thus building a foundation for 
understanding how the human brain is made 
and functions and empowering a new era of 
experimental investigation into the causes 
of human neurological disease. 
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INTRODUCTION: Interindividual variation is a 
known feature of the human brain. In neo- 
cortical regions, broad classes of neurons and 
non-neuronal cells vary in abundance and 
gene expression in healthy adult humans, 
yet variation remains unexplored within finer 
divisions of cell types. Characterizing the ef- 
fects of biological factors such as age, sex, 
ancestry, disease, or genetic variants on high- 
resolution cell types requires molecular profil- 
ing of single nuclei from large donor cohorts. 
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RATIONALE: Single-nucleus RNA-sequencing 
(snRNA-seq) combined with whole-genome 
sequencing (WGS) applied to a large cohort of 
donors enable comprehensive assessment of 
interdonor variation in the neocortex of non- 
aged adults. Integrated analyses of demographic 
characteristics, single-nucleotide polymorphisms 
(SNPs), and gene expression can implicate genes 
associated with biological factors and gene 
regulatory regions in specific cell types. Such 
work can also serve as an important baseline to 
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snRNA-seq profiling reveals interindividual variation in human cortical cell types. The comprehensive 
transcriptomic (snRNA-seq) and genomic (WGS) profiling of cortical tissues from 75 human brain donors 
comprise nearly 400,000 nuclei and cover all major neocortical cell types, each with distinct molecular 
signatures. These cell types exhibited substantial variation in gene expression and abundances between 
individuals driven by multiple biological and genetic factors. 
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interpret cellular variation in neurological hele 
psychiatric diseases that affect cortical func, 
and for understanding changes with age and 
associated neurodegenerative disorders. 


RESULTS: We present comprehensive transcrip- 
tomic (snRNA-seq) and genomic (WGS) profil- 
ing of cortical tissues from 75 human brain 
donors comprising nearly 400,000 nuclei cov- 
ering all major neocortical cell types. We show 
quantitatively that nearly all cells collected 
from these adult donors can be confidently 
assigned to a cell type in a taxonomy defined 
by using only a few donors. This demonstrates 
a highly consistent cellular architecture across 
individuals and confirms the viability of cell type 
mapping for larger-scale studies. These highly 
conserved cell types showed substantial varia- 
tion in gene expression and abundances between 
individuals driven by multiple biological factors. 
Underlying medical conditions in our donor 
cohort affect cellular abundance. For example, 
PVALB-expressing interneurons show decreased 
abundance in epilepsy cases, reflecting pre- 
viously reported cell loss in this disease. 

We found differences in gene expression 
across individuals at the finest cell type reso- 
lution. Gene networks in excitatory neurons 
and glia were particularly variable across donors, 
irrespective of medical condition or brain region. 
Deep-layer neuronal types that communicate 
with distant brain regions showed higher var- 
jation than that in superficial types. A substan- 
tial proportion of variation in gene expression 
is explained by donor, including contributions 
from age, sex, ancestry, and disease state. Fur- 
thermore, genomic variation was significantly 
associated with variable gene expression, with 
most cell types containing cis-expression quan- 
titative trait loci. Yet much variation remains 
unexplained by measured factors, paving the 
way for larger studies of similar design. 


CONCLUSION: By profiling the human brain 
across 75 adult individuals through use of 
snRNA-seg and WGS, we assessed variation in 
cortical cellular abundance and gene expres- 
sion at cell type-level resolution. This study 
indicates a highly consistent cellular makeup 
across human individuals but with substantial 
variation that reflects donor characteristics, 
disease condition, and genetic regulation that 
will provide a comprehensive reference for 
future studies of disease. 
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Single-cell transcriptomic studies have identified a conserved set of neocortical cell types from small 
postmortem cohorts. We extended these efforts by assessing cell type variation across 75 adult 
individuals undergoing epilepsy and tumor surgeries. Nearly all nuclei map to one of 125 robust cell types 
identified in the middle temporal gyrus. However, we found interindividual variance in abundances 

and gene expression signatures, particularly in deep-layer glutamatergic neurons and microglia. A 
minority of donor variance is explainable by age, sex, ancestry, disease state, and cell state. Genomic 
variation was associated with expression of 150 to 250 genes for most cell types. This characterization 
of cellular variation provides a baseline for cell typing in health and disease. 


he human brain displays well-established 

interindividual variability in regional activ- 

ity, morphology, connectivity (7), and brain 

size and shape (2). This variation neces- 

sitates the use of common coordinate 
frameworks in which individual human brain 
maps are morphed to fit a representative aver- 
age for direct comparisons across individuals 
(3). Brain structure relates to many demo- 
graphic and behavioral variables (4), and genet- 
ic variants affect the structure of subcortical 
(5) and cortical areas, including temporal cor- 
tex (6) and structural left-right asymmetry (7). 
Furthermore, functional networks defined with 
resting-state functional magnetic resonance 
imaging (f{MRI) associate with ion channel and 
synaptic gene networks in neurotypical adult 
donors (8), suggesting a link between function 
and gene expression. Common genetic variants 
also influence intrinsic brain activity in brain 
disorders such as schizophrenia and major 
depressive disorder (9). 

Little is known about population differences 
in cell type distributions and gene expression 
in healthy individuals, and about how these 
differences are affected by genetic, environ- 
mental, and demographic factors. Neocortical 
neurons are established in prenatal and early 
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postnatal development and remain relatively 
constant in makeup (JO) and number (JJ) into 
adulthood; similarly, oligodendrocytes have 
been implicated in adult brain variability start- 
ing from prenatal neurodevelopment (12), 
whereas microglia changes have been mostly 
associated with response to inflammatory stress 
(13). Single-nucleus RNA-sequencing (snRNA- 
seq) can directly probe such potential cell 
type-specific changes in gene expression and 
cell type abundances across individuals. Stud- 
ies involving relatively few (n = 3 to 8) healthy 
adult humans have shown strong interindi- 
vidual concordance in the cortex because all 
cell types include representation from multi- 
ple donors without the need to explicitly ac- 
count for donor of origin in the analysis (4, 15). 
However, glutamatergic (excitatory) cells show 
greater gene expression changes across corti- 
cal regions than that of y-aminobutyric acid- 
ergic (GABAergic) (inhibitory) cells in both mice 
(176) and humans (17), and supragranular glu- 
tamatergic neurons have been shown to be 
highly variable across cell type, cortical depth, 
donor, and species (/4, 15, 18), indicating some 
interindividual differences. Larger studies of 
Alzheimer’s disease (AD) reported high varia- 
bility in cell type proportions and gene ex- 
pression related to disease phenotypes (19); 
however, these studies are currently lacking 
in healthy individuals. 

Heterogeneity in gene expression across in- 
dividuals can arise from biological factors such 
as age, sex, ancestry, random developmental 
variation in brain wiring (20), or genetic var- 
iants. Combined genotype and snRNA-seq data 
permit investigation of the functional effects 
of disease-associated variants and identifica- 


13 October 2023 


tion of genes whose expression is under genetic 
control through expression quantitative trait 
locus (eQTL) analysis. Previous studies have 
identified brain-specific eQTLs (27), some of 
which have been associated with diseases such 
as AD (19, 21, 22). However, these studies focus 
on more coarsely defined cell types (such as 
“neurons” or “GABAergic interneurons”), and 
the extent to which genetic variants can mod- 
ulate expression within finer cell type annota- 
tions [such as specific subtypes of vasoactive 
intestinal peptide-positive (VIP*) GABAergic 
interneurons] remains to be completely eluci- 
dated. In this study, we investigated variation 
in cortical cellular abundance and gene ex- 
pression across 75 adult individuals to provide 
valuable insight on how cell type variation 
could reflect donor characteristics, disease con- 
dition, and the underlying genomic landscape. 


Sample collection and processing 


Cortical tissue collected from 75 individuals 
undergoing neurosurgery for intractable epi- 
lepsy and/or tumor removal (Fig. 1A and table 
S1) was used for droplet-based snRNA-seq and 
whole-genome sequencing (WGS). After ex- 
cluding outliers, an average of 4597 nuclei per 
individual [interquartile range (IQR), 3344 to 
5396] was included in this study (Fig. 1A). 
Using an iterative approach (23), we mapped 
these nuclei to ahuman middle temporal gyrus 
(MTG) classification of 131 highly predictable, 
fine-grained cell types (called “supertypes”), 
including 125 based on postmortem tissue 
from neurotypical donors (http://sea-ad.org) 
and six additional non-neuronal types from 
this cohort (Fig. 1B, fig. S1, and Materials and 
methods). As with postmortem donors, most 
cell types were rare except for several intra- 
telencephalic (IT) neuron and glial types (Fig. 
1C). Because of this overall sparsity of cells 
within supertypes, we focused many compar- 
isons on 24 high-fidelity subclass assignments 
(fig. S1). 


Population variation in cell type abundances 


We first investigated whether demographic 
metadata, brain region, or donor medical con- 
ditions could explain the variance seen in gene 
counts and cell type abundances using a linear 
model. Variability in gene counts was not as- 
sociated with any metric for any cell type (fig. 
S2). By contrast, some abundance changes 
were associated to medical condition and brain 
region (Fig. 2, A and B, and fig. S3A). For ex- 
ample, layer 5 extratelencephalic (L5 ET) neu- 
rons had lower abundance in tumor cases 
[false discovery rate (FDR) < 0.01] even after 
accounting for regional differences [frontal 
cortex (FRO) < MTG, FDR < 0.01]. Parvalbumin 
(Pvalb) interneurons were slightly reduced in 
epilepsy cases (uncorrected P < 0.05), which is 
consistent with previous reports that demon- 
strated reduced PVALB" interneurons in certain 
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Fig. 1. Study design and overall gene expression variation in neurosurgical cohort. (A) Summary of data types, individuals, demographics, and quality metrics 
in this study. (B) Schematic of quality control and cell type assignment for single nuclei collected in this study. (C) (Top) Dendrogram of reference supertypes 
from this study (identical to the taxonomy from http://sea-ad.org). (Bottom) Number of cells from this study mapped to each supertype. 


focal cortical dysplasias (24) and similar changes 
in epilepsy (25). Regional changes in L4& IT 
(Fig. 2B) and L6b neurons (fig. S3A) likely re- 
flect differences in cytoarchitecture between 
cortical areas. Cell type abundances did not 
significantly differ by sex (uncorrected P = 0.18) 
(fig. S3B). Only oligodendrocyte progenitor 
cells (OPCs) showed significant association with 
age, decreasing approximately twofold from 
ages 20 to 70 (P = 1.2 x 10 *) (Fig. 2A and fig. 
S3C), mirroring decreased generation of OPC 
daughter cells reported in mouse hippocam- 
pus between 6 and 24 months (26). 

Proper local and global brain network dy- 
namics require a tight balance of excitation 


Johansen et al., Science 382, eadf2359 (2023) 


and inhibition (27). Dysregulation of this bal- 
ance can affect whole brain dynamics (28) and 
is implicated in neurodevelopmental disorders, 
such as autism spectrum disorder, for which 
an increased excitatory-inhibitory (E-I) ratio 
has been linked to memory, cognitive and 
motor deficits, and seizure development (29). 
A breakdown of this E-I balance has been 
directly recorded in humans during seizures 
(30), but it is unknown whether such dynam- 
ics relate to differences in underlying distrib- 
ution of cells of various types. To address this, 
we calculated the E-I ratio for each donor and 
applied the linear model described above (Fig. 
2C). As in the primary motor cortex (/4), the 
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average E-I ratio in human was approximate- 
ly 2. The spread of E-I ratio was quite large 
across individuals (SD = 0.66), and this high 
variation was not associated with medical con- 
dition, sex, or brain region. These results do 
not confirm the hypothesis that E-I ratio is crit- 
ical for network dynamics in this context. 


More variability in surgical compared with 
postmortem tissue 


We next assessed changes in gene expression 
and cellular abundance between 45 individ- 
uals undergoing epilepsy surgery in MTG and 
compared them with the three donors from 
the reference MTG taxonomy with unbiased 
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Fig. 2. Cell type-specific differences in abundance across donors. (A) (Top) Fraction of cells per donor 
from each subclass, scaled to the total number of cells per class. For all bar plots, points indicate values 
per donor, with bars showing mean + standard error. (Bottom) Significant associations between abundances 
and metadata based on a linear model, with covariates for listed metadata and batch. (B) Abundances 
after dividing donors by medical condition and brain region for select subclasses with significant changes. 
(C) Bar plots as in (B) but showing the (excitatory-inhibitory) E-| ratio per donor (E-I ratio; y axis). 


sampling. Similar numbers of genes were de- 
tected per subclass across tissue sources; how- 
ever, neurosurgical cases had approximately 
twice the variation between individuals (P = 2.8 
x 10 ‘) (fig. S4A). In neurons, most genes showed 
consistent expression by tissue source [for 
example, correlation coefficient (7) = 0.96, 
P ~ O for Pvalb], whereas the agreement was 
less consistent in non-neurons (for example, 
r = 0.72, P ~ O for Oligodendrocytes), with 
more genes having higher expression in neu- 
rosurgical than postmortem cases rather than 
vice versa (fig. S4B). Similarly, we found good 
agreement in the overall abundances of most 
cell types between tissue types (7 = 0.98, P = 6.4 
x 10°” (fig. S40), although nearly all cell types 
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showed increased variation in abundances in 
neurosurgical tissues [coefficient of variation 
(COV) in neurosurgical tissue (NS) > COV in 
postmortem tissue (PM) for all types except 
L6 IT Car3] (fig. S4D). This might be due to 
differences in neurosurgical tissue dissections 
because the abundances of subclasses located 
in superficial cortical layers (such as Vip, Sncg, 
Pax6 interneurons, and L2/3 excitatory neu- 
rons) were more highly correlated with each 
other than with subclasses located in deeper 
layers (fig. S4E). Last, as with tumor cases 
(Fig. 2B), relatively fewer Pvalb neurons were 
seen in neurosurgical than in postmortem 
cases (P = 0.025) (fig. S4F). Overall, these re- 
sults suggested more variability in neurosur- 
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gically derived compared with postmortem- 
derived tissue. Although some variation can be 
explained technically, further analysis of pop- 
ulation variation is needed to determine the 
source of this variability. 


Cell type-specific population variation in 
gene expression 


To gain an initial understanding of cell-to-cell 
variability, we plotted a uniform manifold ap- 
proximation and projection (UMAP) of all 
snRNA-seq data using highly variable genes 
without additional data integration. Cells 
showed clear separation by subclass (Fig. 1B), 
and most supertypes also visually segregated 
within subclasses (fig. S1), indicating largely 
consistent gene expression across donors. 
However, interindividual variability was highly 
dependent on cell type. Interneurons showed 
little variability among donors, whereas gluta- 
matergic neurons and some non-neurons 
showed more evident interindividual differ- 
ences (Fig. 3A), suggesting more variable 
expression profiles. Donor entropy, which 
measures the number of donors of origin 
for nuclei in a local neighborhood, was used to 
quantify these observations (Fig. 3B) and iden- 
tified 1971 nuclei from two donors (fig. $5, A 
to C) that expressed tumor markers, likely 
representing infiltrating tumor cells. Ki67 
staining demonstrated the presence of prolif- 
erating cells in these donors (fig. S5B), whereas 
Ki67* cells were generally low or absent in 
other donors (fig. S5D). 

To identify genes driving population varia- 
tion, we grouped cells in subclasses and com- 
pared the gene distance between pairs of nuclei 
from the same donor against pairs from dif- 
ferent donors, defining “high-variance genes” 
as those showing significant difference, with 
FDR < 0.00494 (this cutoff reduced the likeli- 
hood of false positives) (Materials and methods). 
Glutamatergic neuron types from layer 6 and 
microglia had a larger fraction of high-variance 
genes than that of any GABAergic or non- 
neural types (fraction > 0.5 versus fraction < 
0.35) (Fig. 3C and fig. S6A). Gene counts were 
not correlated with the number of cells per 
subclass (fig. S6B), suggesting that this result 
likely reflected a biological difference. Many 
high-variance genes have known biological 
reasons for their high interindividual variabil- 
ity (FDR < 0.00494 in all cases) (Fig. 3D and 
fig. S6, C and D). For example, genes on the Y 
chromosome (UTY and TTTY/4) and XIST, a 
key regulator of X-chromosome inactivation 
in females (37), are particularly variable across 
subclasses—likewise for immediate early genes 
(FOS, JUN, and JUND), whose expression can 
change rapidly in response to extracellular 
stimuli (32) and were higher in neurosurgical 
relative to post mortem tissue (75). Last, genes 
associated with the amount of nuclear RNA, 
including housekeeping genes (GAPDH and 
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Fig. 3. Donor-specific gene signatures differ by cell type and sex. (A and B) 
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and donor characteristics. Heatmap shows the fraction of nuclei of a given subclass 


ACTB) and the highly expressed and nuclear- 
localized transcript MALAT1 (33), showed high 
interdonor variability. 

Next, we performed gene ontology (GO) en- 
richment analysis separately on variable genes 
for each subclass (FDR < 0.001 for all terms) 
(fig. S6E). Glial types were significantly en- 
riched for “glutamatergic synapse” genes, cap- 
turing neuron-glia and glia-glia signaling and 
representing the most variable genes when 
comparing human to nonhuman primate (34). 
“Inflammatory response” genes showed high 
variability selectively in microglia (Fig. 3D and 
fig. S6D); likewise, moderate IBA1 reactivity 
(pathology scoring between 1 and 2 on a scale 
from O = none to 3 = extensive) was found by 
using histology for one-third (31 of 90) of a 
partially overlapping set of individuals [details 
are available in (J8)], suggesting variable 
abundance of reactive microglia. If present, 
GO categories enriched for neuronal subclasses 
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®) Non-neuronal 


were related to generic neuronal processes 
(fig. S6E), providing little biological insight. 


Distinct gene profiles in nuclei from females 


Many sources of variation can affect gene ex- 
pression within a subclass, including supertype, 
cortical cell depth (78), experience-dependent 
transcriptomic states (33, 35), disease state 
(19), or demographic information. We used 
random forest (RF) classification (36) to iden- 
tify donors with nuclei that show distinct gene 
signatures and asked whether these donors 
exhibited specific features. RF prediction ac- 
curacies varied widely by donor and subclass 
(Fig. 3E, heatmap). Glutamatergic neurons 
were more predictive of donor overall than 
were most GABAergic and non-neuronal types 
(Fig. 3, E and F), and these results were con- 
sistent whether variable genes or principal 
components were used for RF prediction (7 = 
0.92, P = 2.0 x 10°) (fig. S7A). RF predictabil- 
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ity of donor and the fraction of high variance 
genes showed high concordance (7 = 0.88, P = 
1.5 x 10°°) (Fig. 3G) with deep-layer glutama- 
tergic types and microglia showing the largest 
values by both metrics of interindividual var- 
iability. Female donors showed more RF pre- 
dictability than that of males in most types 
(Fig. 3, E and F). This can be partially explained 
by sex chromosome genes, which had particu- 
larly high RF relevance and population variabil- 
ity, and more RF relevance in females than males 
(fig. S7, B and C, and S8). Last, few subclasses 
showed differences in RF predictability by brain 
region or medical condition (Fig. 3, E and F). 


Multiple contributing factors of gene variation 


To reveal the contribution of biological factors 
such as ancestry (37), sex, age (38), and dis- 
ease state (19) as well as technical factors on 
transcriptomic variation, we used variation par- 
titioning (39) to fit a subclass-specific linear 
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mixed model to the snRNA-seq expression per 
gene. Thus, we divided the per-gene total var- 
iance into contributions from each factor and 
modeled the unexplained (or “residual”) vari- 
ation (Fig. 4A), with total contributions sum- 
ming to 1. The number and magnitude of 
donor and supertype-associated genes ranged 
widely across subclass (Fig. 4B). Glutamatergic 
neurons had greater donor-associated contri- 
butions to variation than those of GABAergic 
and non-neuronal types, mirroring quantifica- 
tions of donor entropy (Fig. 3B). Deep gluta- 
matergic types had more donor-associated 
genes (nm = 1114; >20% variance explained) 
than that of L2/3 IT types (7 = 102), likely ex- 
plaining their improved RF performance (Fig. 
3E), and again highlighting their interdonor 
variability. Additionally, GABAergic neurons 
contained excess supertype-associated genes 
(Fig. 4B), reflecting the many supertypes per 
GABAergic subclass (fig. S9). Together, these 
results point to conserved expression of marker 
genes for GABAergic supertypes. 

As expected, genes with the greatest fraction 
of variance associated with sex were located 
on the X and Y chromosomes (table 82), in- 
cluding the high-variance genes XIST, UTY, 
and 77TYI/4 (Fig. 4C). Variation between in- 
dividuals accounted for >20% of variation in 
many genes expressed in Pvalb interneurons 
(52) and L6b glutamatergic neurons (336), 
making this the most prominent known factor 
(Fig. 4C and table $3). LRRC37A had high 
donor-associated variability in both Pvalb 
(51.6%) and L6b (63.1%), Whereas HIPKI and 
GRB2 showed such variability specifically in 
layer 6b (L6b) neurons (Fig. 4, C to F). Besides 
some pan-neuronal genes such as JUND and 
LRRC37A, the top donor-associated gene sets 
were conserved across subclasses and asso- 
ciated with common GO terms, including cell- 
cell adhesion (FDR < 1 x 10°”) and synaptic 
assembly (FDR < 1 x 107°). Each subclass also 
had distinct donor-specific gene sets, likely 
owing to different gene programs active in 
each type (Fig. 4E) (15, 40). Few genes varied 
with age (n = 76 and 88, >2% variation ex- 
plained), and these are predominantly sub- 
class-specific and contained in cell adhesion 
(FDR < 4.9 x 10“) or membrane (FDR < 1.1 x 10) 
GO categories. 

The great majority of genes in all subclasses 
had a large residual component (Fig. 4C and 
fig. S10), and the reason remains to be deter- 
mined. Highly and consistently expressed marker 
genes such as PVALB, SST, and SLCI7A7 had 
high residual variation only within the rele- 
vant subclass. Genes with critical functions 
such as ARID2, a known chromatin remodel- 
ing factor with prior cancer associations (41), 
were residual-associated and conserved across 
abundant glutamatergic neuron types. Last, 
the median gene expression and the amount 
of variance explained by the residual were nega- 
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tively correlated (fig. S11), suggesting the po- 
tential effect of transcriptional noise such as 
random fluctuations of low expressed genes. 


Cell type-specific cis-eQTLs effects on gene 
expression variation 


To extend previous studies that linked genet- 
ic variants to variation in gene expression 
(21, 22, 42-46), we identified cell type-specific 
eQTLs from snRNA-seg and high-quality WGS 
(fig. S12) performed for all 75 individuals. We 
identified an average of 200 eGenes, genes 
that are part of a cis-eQTL with FDR < 0.05, 
per subclass (Fig. 5A). However, the number 
of significant cis-eQTLs (FDR < 0.05) varied 
widely, with less abundant types, including 
vascular leptomeningeal cells (VLMCs) (n = 
637) and L6b (n = 8213), having fewer cis-eQTLs 
than more abundant types (Fig. 5B). The num- 
ber of detected eQTLs saturated when >15,000 
cells were sequenced in a subclass (Fig. 5B). 
Of the significant cis-eQTLs, we observed an 
enrichment of variants upstream of the tran- 
scription start site (TSS) or around the gene 
body as compared with variants downstream 
of the TSS (Fig. 5C). 

We next sought to understand the specific- 
ity of each cis-eQTL to gene expression var- 
iation for each subclass. Cis-eQTLs in deep 
glutamatergic types associated with gene ex- 
pression variation for several genes, includ- 
ing KDMI1B in L6b (Fig. 5D). Additionally, 
LRRC37A2, a paralog to LRRC37A that has 
been identified as having donor-associated var- 
jation in expression (47), was identified with 
cis-eQTLs across most neuronal subclasses 
(Fig. 5E) that were largely in linkage disequi- 
librium with haplotypes of MAPT (table S4) 
(47). This result suggested neuronal cell type- 
agnostic genetic control of LRRC37A2, which 
should match bulk RNA-seq eQTL studies. 
An unbiased per-chromosome eQTL analysis 
of all variants identified subclass-specific eQTLs 
and eQTLs shared across excitatory and inhib- 
itory neurons (table S5). Together, these results 
showed that genetic control of gene expression 
can extend to finer cell type resolution than 
previously studied (21, 22, 42-46) and may be 
increasingly relevant for neurodegenerative 
or other disorders that preferentially affect 
specific cell types (19). 


Application to disease studies 


To demonstrate the utility of this approach 
for studying disease, we extended our analysis 
to 84 aged donors from the Seattle Alzheimer’s 
Disease Brain Cell Atlas (SEA-AD), evenly split 
between subjects with and without dementia. 
snRNA-seq data collected from MTG of SEA- 
AD donors were compared with the 45 indi- 
viduals who underwent epilepsy surgery in 
MTG. Aged donors with dementia showed 
higher variability in abundance compared with 
aged donors without dementia for most cell 
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types (fig. S13, A and B). By contrast, variability 
in abundance was higher in adult than aged 
donors, likely because of multiple biological 
and technical factors. Cell type abundances 
were generally well matched between groups, 
with Pvalb neurons again showing lower abun- 
dance in epilepsy (P = 2.7 x 10° when com- 
paring adult donors with epilepsy versus 
SEA-AD donors) (fig. S13C). The E-I ratio was 
similar between groups, indicating no obvious 
abundance shifts in broad neuronal classes in 
epilepsy or dementia (fig. S13D). 

Most subclasses likewise showed many more 
highly variable genes in demented versus non- 
demented aged donors (fig. SI14A and table 
S6); similarly, genes showed higher COV in 
demented versus nondemented aged donors 
(fig. S14B). A small but expected set of genes, 
including sex chromosome genes (X/ST) and 
markers of nuclear content (UWALATT), showed 
consistently high variability across groups 
(fig. S14, C and D, and table S7). Other genes 
showed selective variability in each donor 
group across multiple cell types. In GABAergic 
interneurons, for example, immediate early 
genes including FOS were expressed in some 
adult donors but minimally in SEA-AD data, 
likely reflecting neurosurgical conditions rather 
than changes with age. In addition, LRRC37A 
and genes with known MAPT interactions 
(FDR < 10 ®) were also variable only in adult 
individuals, potentially indicating a change 
in function of MAPT with age or a difference 
in the distribution of MAPT haplotypes be- 
tween cohorts. There were 102 genes varia- 
ble only in donors with dementia, including 
LINC00342, which has been shown to be down- 
regulated in Huntington’s disease (48). An 
enrichment analysis failed to identify signifi- 
cant GO categories. Last, 27 genes including 
five ribosomal genes (RPLAJ) (fig. S14D) showed 
selective variation in the aged, nondemented 
group (table S7). Altogether, the data suggest 
the potential to extend analysis of variation to 
studies of aging and disease. 


Discussion 


We present a comprehensive transcriptomic 
and genomic profiling of cortical tissues from 
75 human brain donors comprising nearly 
400,000 nuclei covering all major neocortical 
cell types. Our results reveal highly conserved 
cell types with substantial variation in gene 
expression and cell type abundances between 
individuals that are associated with multiple 
factors. Human glutamatergic neurons have 
higher interindividual gene expression varia- 
bility and predictability than those of GABAergic 
interneurons. This variability is most prom- 
inent in deep-layer glutamatergic types, in 
contrast with previous work showing that 
supragranular glutamatergic neurons are 
highly variable across many axes, including 
cell type, cortical depth, donor, species, and 
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Fig. 4. Variation partitioning explains differences in cell type-specific gene 
expression. (A) Interaction graph visualizing the estimated contribution of each 
covariate (green nodes) to the variance in expression for each gene (gray 
nodes) (covariate-gene edge), and the associated subclass (pink nodes) 
(gene-subclass edge). The covariate-gene edge weight indicates the percentage 
of variance explained for a gene by a covariate, and the gene-subclass edge 
weight indicates the nonresidual contributions to variance explained for the 
gene. (B) Beeswarm plots showing for each subclass the amount of variation 
explained per gene, indicated with a point, by donor on the top row and 
supertype on the bottom row. (C) Violin plots showing the percentage of 
variation explained per gene by each covariate for the Pvalb and L6b subclass. 
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(D) Top genes whose variance can be associated with donor, supertype, ancestry, 
and residuals for the Pvalb and L6b subclasses are shown in the paired boxplots. 
(E) Top ranked genes with respect to variance explained by donor across all 
subclasses are shown in the heatmap. Each row is a gene, and each column 

is a subclass colored according to rank determined from the percent variance 
explained by donor per subclass. (F) Stratification of gene expression by 

donor for the top, median, and third quantile donor-associated genes colored 
by red, purple, and blue, respectively. Donors are ordered by median expression 
for each gene. (G) Stratification of gene expression by supertype for the top, 
median, and third quantile supertype-associated genes colored by red, purple, and 
blue, respectively. Supertypes are ordered by median expression of each gene. 
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Gene expression 


Fig. 5. Cell type-specific gene expression variation is associated with 
genetic effects. (A) Scatterplot showing the relationship between number of 
cis-eQTLs (FDR < 0.05) on the y axis and number of nuclei on the x axis for each 
subclass. (B) Barplots showing the number of eGenes (FDR < 0.05) identified 
for each subclass, grouped by glutamatergic, GABAergic, and non-neuronal. 


morphoelectric properties (14, 15, 18). How- 
ever, deep-layer glutamatergic neurons develop 
earlier (49) and have longer and more exten- 
sive connections (50) than those of superfi- 
cial projection neurons, potentially exposing 
them to more diverse environmental factors 
and potentially diversifying gene patterns as 
the result of individual life experiences and 
sensory inputs. 

This study extends traditional brain atlas 
studies in several notable ways. First, it pro- 
vides a baseline for quantifying changes in cell 
type variation occurring with aging. Aging has 
been associated with cellular pathway dysreg- 
ulation, leading to increased transcriptional 
heterogeneity (38) and high variation in cell 
type abundances (19), both of which we also 
see in dementia. Second, genes with high sig- 
nificance for subclass or supertype and low 
significance for donor metrics likely represent 
robust gene sets that could be used for cell 
type identification using in situ spatial tran- 
scriptomic methods such as multiplexed er- 
ror robust fluorescence in situ hybridization 
(MERFISH) (57). Third, this study represents a 
very detailed eQTL analysis in adult human 
brains, providing an opportunity to associate 
disease-risk genes and variants from genome- 
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wide associate studies (52) with gene expres- 
sion in specific subclasses. Similarly, compu- 
tational deconvolution (53) of eQTL analyses 
performed on bulk tissue (21, 54) to resolve 
cell type signals can be refined by using the 
subclass-specific eQTLs reported here. Last, 
because morphoelectric features were collected 
from cells in adjacent sections, it will be pos- 
sible to associate gene expression and cellular 
function; for example, variability in ion chan- 
nel gene expression might link to electrophys- 
iological properties of cells from the same 
donors (J8). 

Use of neurosurgical tissue in this study in- 
troduces some limitations alongside the op- 
portunities presented. Although samples were 
largely nonpathological (78), some variation 
was associated with the circumstances of tis- 
sue collection rather than biological states. 
Immune response genes vary by donor, per- 
haps indicating differential responses to stress 
from surgeries (55) or underlying pathologies 
(56), and two donors showed infiltration of 
tumor cells. Because tissue collection is de- 
signed to minimally affect surgical procedures, 
tissue size, shape, and precise anatomic loca- 
tion differ from case to case, potentially biasing 
the resulting cell type abundances. Similarly, 
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(C) Enrichment of cis-eQTLs around the transcription start site of the proximal 
gene for each subclass. (D) Examples of cis-eQTLs, FDR < 0.05 denoted by 

an asterisk, with effects restricted to deep excitatory types and Lob. (E) Examples 
of cis-eQTLs, FDR < 0.05 denoted by an asterisk, with cell type-agnostic and 
associated effects on gene expression. 


the precise dissection location within a corti- 
cal area may differ between individuals in a 
way that we do not have metadata to capture. 
However, because few genes show variation 
related to cortical area, and bulk transcrip- | 
tomics finds minimal gene expression differ- . 
ences between adjacent neocortical samples 
(57), any associated donor variation is likely 
relatively small. Tissue was primarily collected 
from donors of European ancestry, represent- 
ing a cross section of the population under- 
going neurosurgery in the local geographic 
region. Last, relatively few cells were collected 
per donor (4597 on average, from a single 
tissue section), limiting analyses for rare types. 
Together, these biases likely explain some of 
the increased variability in this study relative 
to previous cell typing studies in adults and 
aged, nondemented donors. Future studies 
from postmortem cases with carefully con- 
trolled tissue sampling or additional in situ 
labeling of cell types could help to distinguish 
biological from technical variation. 

The overlapping genes LRRC37A and ARL17B 
showed high interindividual variability and 
contained many eQTLs selectively for neu- 
ronal but not glial types. These genes are lo- 
cated at chromosome 17q21.31 alongside MAPT, 
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which encodes tau protein, and are part of a 
common ~900-kb inversion polymorphism 
that defines the H2 haplotypes of MAPT (47). 
The H1 haplotype (no inversion) has been 
implicated in multiple neurodegenerative 
diseases through aggregation of hyperphos- 
phorylated tau in neuronal cell bodies (58). 
LRRC37A showed lower expression in MTG 
and FRO of individuals with H1 versus H2 
haplotype in populations with European an- 
cestry (47), suggesting a possible protective 
effect. A recent study on Parkinson’s disease 
linked protective subhaplotypes with increased 
LRRC37A expression in brain, although LRRC37A 
was primarily expressed in astrocytes in the 
substantia nigra (59). These studies coupled 
with our results point to LRRC37A and ARL17B 
as potential candidates for consideration in 
studies of neurodegeneration. 

This study provides a genomic and tran- 
scriptomic overview of cortical cell type varia- 
tion in adult human individuals, identifying a 
highly consistent cellular makeup but with sub- 
stantial variation that reflects donor character- 
istics, disease condition, and genetic regulation. 


Materials and methods 
Neurosurgical-based cohort 


Overlying cortical tissue was collected from 
75 adult individuals undergoing neurosurgery 
for intractable epilepsy (m = 50), removal of 
tumors (7 = 23), or both (n = 2) (Fig. LA and 
table S1). Most tissue was from middle tem- 
poral gyrus (MTG; n = 56) or temporal cortex 
more broadly (7 = 2), with the remaining tis- 
sue samples from frontal (FRO; 7 = 13) or 
other cortical areas (nm = 4). Males represented 
just over half of the donors from each medical 
condition. Tissue was collected for droplet- 
based snRNA-seg and whole-genome sequenc- 
ing (WGS) from adjacent sections to those 
processed for patch-seq studies (J8) following 
the published protocol (60) described below. 


Assessment of tissue quality 


We have previously shown that resected tissue 
collected from a partially overlapping set of 
individuals is largely neurotypical. Histolog- 
ical assessments of neurodegeneration, gliosis, 
and tumor infiltration show minimal indica- 
tors of pathology (J8). Patch-seq cells display 
no obvious relationship between electrophys- 
iology or morphology and pathology, sex, or 
age (J8), or time in patch clamp solution (67). 
Finally, gene expression from neurosurgical 
tissue has high fidelity, as collected nuclei 
show relatively few gene expression changes 
from postmortem tissue counterparts (/5). 


Experimental data collection 


Detailed descriptions of all experimental data 
collection methods in the form of technical 
white papers can also be found under “Docu- 
mentation” at http://celltypes.brain-map.org (J8). 
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Human tissue acquisition and processing 
Tissue specimens were obtained from local 
hospitals (Harborview Medical Center, Swedish 
Medical Center, and University of Washington 
Medical Center) in collaboration with local 
neurosurgeons. Tissue procurement from 
donors undergoing surgery was performed 
at hospitals, fully outside of the supervision 
of the Allen Institute. Tissue was provided to 
researchers under the supervision and author- 
ity of the Internal Review Board (IRB) of each 
participating hospital. A hospital-appointed 
case coordinator obtained informed consent 
from all donors (table S1) before surgery. 
Patients who did not wish to participate in 
tissue donation or who were otherwise unable 
to provide written consent were excluded 
from the study. The specimens collected for 
this study were apparently non-pathological 
tissues removed during the normal course of 
surgery to access underlying pathological 
tissues. Tissue specimens used in the study 
were determined by medical staff to be non- 
essential for diagnostic purposes and would 
have otherwise been discarded. Tissue speci- 
mens were de-identified before receipt by Allen 
Institute personnel, but investigators were not 
otherwise blinded to the experimental condi- 
tion and no statistical methods were used to 
predetermine sample size. 

Immediately following resection, tissue 
was placed in artificial cerebral spinal fluid 
(ACSF) composed of (in mM): 92 N-methyl- 
d-glucamine chloride (NMDG-C1), 2.5 KCI, 1.2 
NaH2P04, 30 NaHCO3, 20 4-(2-hydroxyethyl)- 
1-piperazineethanesulfonic acid (HEPES), 
25 d-glucose, 2 thiourea, 5 sodium-l-ascorbate, 
3 sodium pyruvate, 0.5 CaCl2.4H20 and 
10 MgSO4.7H20. Before use, the solution was 
equilibrated with 95% O2, 5% CO2 and the pH 
was adjusted to 7.3 by addition of 5N HCl 
solution. Osmolality was verified to be between 
295 to 305 mOsm kg". Specimens were then 
transported (15 to 35 min) from the hospital 
site to the laboratory for further processing. 

Acute brain slices (350 um thickness) were 
prepared with a Compresstome VF-300 (Preci- 
sionary Instruments) or VT1200S (Leica Bio- 
systems) vibrating microtome modified for 
block-face image acquisition (Mako G125B 
PoE camera with custom integrated software). 
Slices were transferred to oxygenated and 
warmed (34°C) ACSF as described above for 
10 min and were then transferred to room 
temperature holding ACSF composed of (in 
millimolar): 92 NaCl, 2.5 KCl, 1.2 NaH2PO04, 
30 NaHCO3, 20 HEPES, 25 d-glucose, 2 thio- 
urea, 5 sodium-l-ascorbate, 3 sodium pyru- 
vate, 2 CaCl2.4H20 and 2 MgSO4.7H20 for 
at least 2 hours prior to freezing the slices 
for downstream nuclear isolation. Slices for 
RNA-sequencing were snap frozen in micro- 
centrifuge tubes in a slurry of dry ice and 
ethanol and were stored at -80°C for later use. 
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Nuclear isolation and single-nucleus 

RNA sequencing 

Nucleus isolation for 10x Chromium Single 
Cell 3’ RNA sequencing (v2, v3) was conducted 
as described (/4). Briefly, snap frozen tissue 
sections were removed from the -80°C freezer, 
thawed in homogenization buffer, and homo- 
genized in a Dounce to generate dissociated 
nucleus suspensions. Nuclear suspensions were 
stained with 4’,6-diamidino-2-phenylindole 
(DAPI) to discriminate nuclei from debris 
and mouse anti-NeuN antibody conjugated 
to PE (Millipore, Cat# FCMAB317PE, RRID: 
AB_10807694, 1:500 dilution) was applied to 
nuclear suspensions to discriminate neuronal 
(NeuN") and non-neuronal (NeuN ) nuclei by 
fluorescence-activated nuclear sorting (FANS). 
Nuclei were sorted at a defined ratio of 80% 
NeuN” neuronal and 20% NeuN’ non-neuronal 
nuclei. After FANS, single-nucleus suspen- 
sions were concentrated by centifugation and 
frozen in a solution of 1X phosphate-buffered 
saline (PBS), 1% bovine serum albumin (BSA), 
10% dimethylsulfoxide (DMSO) and 0.5% RNAsin 
Plus RNase inhibitor (Promega, N2611), and 
stored at —80 °C. At the time of use, frozen 
nuclei were thawed at 37°C and processed for 
loading on the 10x Chromium instrument as 
described. Samples were loaded and processed 
using the 10x Chromium single-cell 3’ reagent 
kit v3 according to the manufacturer’s pro- 
tocol. The 10x Chromium single-cell 3’ kit v2 
was used to generate a single library from each 
of three donors (H16.03.010, H16.06.009, 
H16.06.011). For each donor, nuclei were 
loaded onto the 10x chip to target a maximum 
capture and sequencing of 10,000 single nu- 
clei. Gene expression was quantified using the 
default 10x Cell Ranger v3 (Cell Ranger, RRID 
SCR_017344) pipeline, except for substituting 
of the curated genome annotation used for 
SMART-seg v4 quantification. Introns were 
annotated as ‘mRNA’, and intronic reads were 
included to quantify expression. 


10x Chromium Whole Genome Sequencing 


Tissue sections for whole genome sequencing 
(WGS) were from the same brain tissue block 
as used for snRNA-seq data generation. Im- 
mediately after slicing, tissue sections for WGS 
were mounted on glass slides, snap frozen, 
and stored at -80°C in 50m conical tubes until 
later use. To isolate genomic DNA (gDNA), 
tissue was scraped off the glass slide using a 
sterile razor blade into a microcentrifuge tube 
and lysed overnight at 56°C using a Qiagen 
MagAttract HMW DNA Kit (Qiagen, 67563). 
Tissue lysates were processed according to the 
manufacturer’s protocol, with several mod- 
ifications as outlined in the 10x Genomics 
Genome Reagent Kit v2 User Guide. Final 
gDNA samples were run on an Advanced 
Analytical Fragment Analyzer using the Ge- 
nomic DNA 50kb kit (Agilent, DNF-467) to 
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measure gDNA concentration, assess the frac- 
tion of gDNA >40kb in each sample, and de- 
termine the quality of g DNA as measured using 
the genomic quality number (GQN). gDNA 
samples were diluted to Ing/ul and concen- 
trations were verified using the Qubit dsDNA 
HS Assay kit ahead of 10x loading. Samples 
were loaded and processed with the 10x Chro- 
mium Genome Chip Kit v2 and the Chromium 
Genome Library and Gel Bead Kit v2 using 
Chromium 17 Multiplex Kit sample indices. 
10x Chromium sample processing followed 
the manufacturer’s protocol. Samples were se- 
quenced on a NovaSeq 6000 instrument using 
a NovaSeq S4 flow cell aiming for at least 25X 
genome coverage per sample. 


Variant calling in whole-genome 
sequencing samples 


The processing of whole-genome sequencing 
(WGS) data was done using Simple Linux Utility 
for Resource Management (SLURM) on a high- 
performance computing (HPC) cluster envi- 
ronment. The fastq files from 10x Chromium 
WGS were aligned using Burrows-Wheeler 
Aligner (BWA, RRID:SCR_010910) mem to 
the human reference genome GRCh38-2.1.0. 
The bam files were processed using Picards’ 
AddOrReplaceReadGroups to add read groups. 
Variants were called on the processed bam files 
per chromosome using Genome Analysis Toolkit, 
GATK4-4.0.3.0-0 HaplotypeCaller with -ERC 
flag (RRID:SCR_001876) to get Genomic Variant 
Call Format (g.vcf) files with homogenous ref- 
erence calls. The individual g.vcf files were com- 
bined with GATK’s CombineGVCFEs and translated 
to vcf format with GATK’s GenotypeGVCFs. 
Variants with Phred-scaled quality score of 20 
or greater were retained to ensure only high- 
quality variant calls. 


Reference MTG cell type definition and 
assignments for snRNA-seq data 
Previously defined MTG reference cell 
type assignments 


The human MTG classification (or “taxonomy’) 
used in this study matches the one used to 
study aged donors as part of the Seattle 
Alzheimer’s Disease Brain Cell Atlas (SEA-AD; 
sea-ad.org). All data, metadata, cell type assign- 
ments, associated taxonomy files, and docu- 
mentation detailing protocols for each step are 
available at https://brain-map.org. In short, 
151 clusters were defined as described in a 
recent study of cell type conservation across 
great apes (62), using a combination of auto- 
mated and manual QC, Leiden clustering using 
Seurat, and merging of clusters with insuffi- 
cient evidence of differentially expressed genes. 
In addition to defining these high-resolution 
cell types, lower-resolution subclass and class 
assignments are defined as described previ- 
ously for mammalian primary motor cortex 
and match the published interlex: (https:// 
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scicrunch.org/scicrunch/interlex/dashboard) 
terms. Example subclass terms include “SST,” 
“L6 CT,” and “Astrocyte,” while example class 
terms are one of “Neuronal: GABAergic,” “Neu- 
ronal: Glutamatergic,” or “Non-neuronal and 
Non-neural.” All cells passing QC are assigned 
to the same class, and nearly all are assigned 
to the same subclass across taxonomies, even 
though independent cell type assignments are 
generated for each study. 


Creation of MTG reference 
supertype annotations 


We defined “supertypes” as a set of fine-grained 
cell type annotations for single nucleus expres- 
sion data that could be reliably predicted on 
held-out reference data (where “ground truth” 
labels were assigned as described above) using 
a State-of-the-art machine learning approach 
(23). From 5 neurotypical donors in the great 
ape study with roughly 150K nuclei captured 
with 10x snRNAseg we systematically held out 
2 donors at a time and used single-cell anno- 
tation using variational inference (scANVI) to 
iteratively and probabilistically predict their 
class (3 labels), subclass (24 labels), and then 
cluster (151 labels). When predicting each nu- 
cleus’ class, we selected the top 2,000 highly 
variable genes along with the top 500 differ- 
entially expressed genes distinct to each class 
(calculated from the reference cells which had 
their labels retained using a Wilcoxon rank 
sum test) to use as features in training the 
model and specified the donor name and 
number of genes detected as categorical and 
continuous covariates, respectively. Nuclei were 
then separated by their predicted class and 
features were re-selected with the same crite- 
ria to predict subclasses and again in predict- 
ing clusters. A differential expression test was 
run on clusters with an F1 score below 0.7, and 
those without 3 positive markers when com- 
pared against nuclei from their constituent 
subclass (corrected P value <0.05, fraction in 
group expression >0.7, fraction out of group 
expression <0.3) were pruned from the taxon- 
omy. Of the 26 clusters flagged, 24 fell below 
these cutoffs and were pruned from the final 
supertype taxonomy. The remaining 2 (L2/3 
IT_2 and Oligo_3) were retained and recov- 
ered after supertype prediction (see below). 


Pre-mapping quality control of 
isolated nuclei 


Nuclei with fewer than 1000 genes detected 
and doublet score > 0.3 (63) were filtered 
upstream of supertype mapping. Additional 
quality control flags for each cell were deter- 
mined using the public R package QCR (https:// 
github.com/AllenInstitute/QCR). These flags 
were then used as a baseline for additional 
manual filtering of low-quality cells and dou- 
blets. For most individuals <15% of nuclei were 
excluded, resulting in an average of 4,597 IQR, 
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3344 to 5396) nuclei remaining per individ- 
ual (Fig. 1A). 


Mapping isolated nuclei to 
reference supertypes 


After defining supertypes in a neurotypical 
reference, we iteratively and probabilistically 
predicted class, subclass, and supertype for 
our nuclei using scANVI. Briefly, each nucle- 
us’s class was predicted after projection into a 
shared latent space with reference nuclei using 
models trained with 2000 highly variable genes 
and 500 differentially expressed genes per class. 
The highly variable and differentially expressed 
genes were derived from the reference data. 
scANVI was provided donor name and number 
of genes as categorical and continuous covar- 
lates, respectively. Nuclei were then split by pre- 
dicted class, projected into a refined class-specific 
latent space where class-specific subclasses were 
predicted, and similarly for subclass-specific 
supertypes. 


Post-mapping quality control using QCR 
and scANVI 


The subclass-specific latent spaces were then 
used to compute two-dimensional uniform 
manifold approximation and projections (UMAPs) 
and the scANVI predictions were evaluated by 
known marker gene expression (using signa- 
ture scores defined by differentially expressed 
genes in reference nuclei). In regions reference 
nuclei occupied there was strong agreement in 
signature gene expression with our nuclei, in- 
dicating accurate prediction. There was more 
variable expression in regions with poor refer- 
ence support (which also had higher uncertainty 
in their predictions). These areas represented 
either droplets with ambient RNA, multiple 
nuclei, dying cells, or transcriptional states 
missing from the reference, distinct to a donor 
or found only in disease. To triage these pos- 
sibilities, we used the QCR R package (https:// 
github.com/AllenInstitute/QCR) to fracture 
the graph into tens to hundreds of clusters 
(called “meta-cells”) using high-resolution 
Leiden clustering and then merged the clus- 
ters based on differential gene expression. 
Clusters and meta-cells were then flagged and 
removed if they had low within-group doublet 
scores, or number of genes detected eliminat- 
ing common technical sources of transcrip- 
tional heterogeneity. 


Expanding the reference taxonomy for 
non-neuronal cells 


With common technical axes of variation re- 
moved and to address the relative sparsity of 
NeuN nuclei in this reference, we then sought 
to identify nuclei that were transcriptionally 
distinct from the reference and add them to 
our supertype taxonomy. We constructed a new 
latent space for each subclass using scVI (65), 
where the model was aware of the supertype 
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prediction for each nucleus, gene dispersion 
was allowed to vary per supertype, donor name 
and sex were passed as categorical covariates, 
and the number of genes detected in each nu- 
cleus and the fraction of mitochondrial reads 
were passed as continuous covariates. Using 
the neighborhood graph from this latent space, 
we clustered the nuclei into tens to hundreds of 
groups and merged them based on differential 
gene expression using a python version of 
scrattch.hicat (Scrattch.Hicat, RRID:SCR_018099, 
https://github.com/AllenInstitute/scrattch.hicat). 

We defined merged clusters with fewer than 
10% of all reference cells or of any single 
supertype as having limited reference support 
and added them to the taxonomy named as: 
Subclass_Unknown_ ClusterNumber. This pro- 
cess resulted in an additional set of 6 Astrocyte, 
1 Micro-PVM, 3 Oligodendrocyte, and 1 OPC 
transcriptionally distinct populations (fig. S1). 


RNA-seq data analysis 
Assessing gene variability across donors 


Differentially expressed genes between cell 
types or donor demographics (e.g., epilepsy 
versus tumor) are calculated on log2 normal- 
ized counts per million using the Wilcoxon 
test using the FindMarkers function in the 
Seurat R library (65). Genes with higher within- 
donor than between-donor variance were iden- 
tified as follows: (i) for each subclass, select (up 
to) 24 cells per donor; (ii) calculate that aver- 
age gene expression variance within each 
donor; (ili) repeat the first two steps 11 times 
for different sets of cells with real and per- 
muted donor IDs; and (iv) compare the me- 
dian real versus permuted variances. Significant 
genes were selected an FDR threshold such that 
no genes showed larger within- than between- 
donor distance (FDR < 0.0043). Genes with 
the highest ratio of permuted to real variation 
have the largest donor effects and represent 
genes likely to be identified using variance 
partitioning (see below). Resulting differen- 
tial expression and high variance gene sets 
were tested for enrichment of gene ontology 
(GO) and other categories using the ToppGene 
Suite (66). 


Assessing cell type variability across donors 


Cell type abundances were calculated sepa- 
rately for glutamatergic, GABAergic, and non- 
neuronal cell populations to account for biases 
intentionally introduced through FANS, with 
abundances in each group summing to 100 per- 
cent per donor. Changes in abundances be- 
tween donor demographics (e.g., epilepsy versus 
tumor) were calculated using an ANOVA cor- 
rected for multiple comparisons using the false 
discovery rate method. 


Random forest prediction of donor ID 


Random forest prediction was used to assess 
the amount of intra-donor versus inter-donor 
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variation within each subclass by asking for 
what fraction of cells can the assigned cell type 
be accurately predicted. The specific algorithm 
is as follows: (i) for each subclass, select (up to) 
24 cells per donor; (ii) run random forest 
prediction (57) using 75% training, 25% test 
four times to get predictions for all 24 cells; 
and (iil) repeat first two steps 10 times to cal- 
culate mean and standard deviation of RF pre- 
diction scores per subclass, per donor. These 
steps were performed using the 2000 most 
highly variable genes (main text) and also using 
the first 30 principal components (supplemen- 
tary materials). These classifications accura- 
cies are also calculated in data where donor 
IDs are randomized within subclass for com- 
parison. Donors and subclasses are then hier- 
archically clustered by average prediction 
accuracy, groupings of subclasses and donors 
with respect to metadata are assessed using a 
multivariate hypergeometric distribution. 


Assessing donor entropy across 
cell types 


Donor entropy per subclass was used to assess 
the amount of alignment between donors. The 
nearest 100 neighbors for each cell was com- 
puted using “spatial.cKDTree” from the scipy 
library. The donor annotation for each of the 
100 nearest neighbors were summarized as 
proportions that were used to compute donor 
entropy with the “stats.entropy” function from 
the scipy library. We repeated this procedure 
for each cell in each subclass. 


Variance Partition analysis 


Variation partitioning analysis was utilized to 
prioritize the drivers of variation within each 
subclass. Using linear mixed-effect models 
implemented in the variancePartitioning bio- 
conductor package: http://bioconductor.org/ 
packages/variancePartition (67) we identify 
the genes whose variance is best explained by 
donor, supertype, brain region, demographic 
(ancestry, age, sex) factors, disease states (epi- 
lepsy versus tumor) and technical factors 
(batch). The specific algorithm is as follows: 
(i) for each subclass we filter to cells passing 
QCR flags and remove any clusters with a 
single member. (ii) Genes are removed from 
the analysis if the gene was not expressed in 
> 10 cells, have greater than 80% dropout, 
have zero variance, have an expression of less 
than 1 CPM on average and are in the ZNF or 
LOC families. Gii) The variance partitioning 
linear model was then defined as: 


gene ~ age + ([|condition) + (1|sex) 
+ (1|donor) + (|supertype) + (1|batch) 
+ ([b7rainresion) + Alancestry) 


and passed into the variancePartition function 
“fitVarPartModel.” We determined the amount 
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of variation explained per covariate for each 
gene with the “extractVarPart” function. 


Comparison with SEA-AD data 


SnRNA-seq data collected from the MTG of 
each donor along with relevant metadata was 
downloaded from sea-ad.org. Metadata for 
each cell was downloaded from the AWS link to 
processed data (https://registry.opendata.aws/ 
allen-sea-ad-atlas), and .h5ad files containing 
gene counts for each subclass were down- 
loaded from the CELLXGENE (https://cellxgene. 
cziscience.com/collections/1ca90a2d-2943-4.83d- 
b678-b809bf464:c30). We note that data from 
CELLXGENE and AWS are identical, but the 
full data matrix on AWS is not readable with 
R. To assess abundance in this cohort, we re- 
peated the analysis described in “Assessing 
cell type variability across donors” for the 
SEA-AD cohort as a whole and for the non- 
demented and demented donors separately, 
and also defined coefficient of variation of 
abundances across donors. To define gene ex- 
pression variation, we repeated the analysis 
described in “Assessing gene variability across 
donors” section for the subset of overlapping 
genes in the two SEA-AD groups and a restricted 
set of 45 individuals in this study who underwent 
epilepsy surgery in MTG. Violin plots of gene 
expression were created used Seurat. 


Whole-genome sequencing analysis 
eQTL analysis 


SnRNA-seq data collected from the MTG 
of 84 aged donors along with relevant meta- 
data was downloaded from sea-ad.org. For 
eQTL analysis we filtered to the following 
SNPs: (i) minor allele frequency (MAF) < 5%, 
(ii) significant in an eQTL from ROSMAP, 
CommonMind or brain tissue from GTEx. 
The .vef files containing SNP data were con- 
verted to dosage files using samtools -dosage 
plugin (SAMTOOLS, RRID:SCR_002105). The 
gene expression matrix was filtered as follows: 
(i) not expressed in > 10 cells, (ii) having greater 
than 80% dropout, (iii) having zero variance, 
(iv) having an expression of less than 1 CPM 
on average and (v) in the ZNF or LOC fam- 
ilies. After filtering both the SNP and gene 
expression data we used the Matrix eQTL 
software package (67) to identify significant 
eQTLs. In the Matrix eQTL model we included 
the top 3 genotype PCs to account for an- 
cestry differences as well as technical covar- 
iates including the tissue collection site. 


Acquiring ROSMAP eQTL 

summary Statistics 

We obtained eQTL summary statistics for the 
ROSMAP study from the xQTL analysis per- 
formed in Ng et al. 2017. We downloaded the 
summary tables for significant eQTLs from the 
updated xQTL statistics (2021) hosted on http:// 
mostafavilab.stat.ube.ca/xqtl. 
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Acquiring CommonMind eQTL 
summary Statistics 


We obtained eQTL summary statistics for the 
CommonMind study from the CommonMind 
Consortium Knowledge portal Synapse proj- 
ect (CommonMind Consortium, RRID:SCR_ 
000139, https://www.synapse.org/#!Synapse: 
syn4.622659). 


Acquiring GTEx eQTL summary statistics 


We obtained eQTL summary statistics for brain 
tissues from version 8 of the GTEx study at: 
https://www.gtexportal.org/home/datasets 
where we downloaded the following file: 
https://storage.googleapis.com/gtex_analysis_ 
v8/single_tissue_qtl_data/GTEx_Analysis_ 
v8_eQTL.tar. 
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Molecular programs of regional specification and neural 
stem cell fate progression in macaque telencephalon 
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Suvimal Kumar Sindhu, Jon I. Arellano, Tianliuyun Gao, Mikihito Shibata, 
Kevin T. Gobeske, Alvaro Duque, Gabriel Santpere, Nenad Sestan*, Pasko Rakic* 


INTRODUCTION: Telencephalic development in- 
volves intricate molecular processes that regu- 
late the spatial identity of the neural cells over 
time. Early patterning centers specify neural 
stem cells (NSCs), which generate neurons and 
glia. We explored these events in macaque mon- 
keys and human brain organoids, using single- 
cell transcriptomics. 


RATIONALE: The molecular mechanisms pivo- 
tal to brain development have been character- 
ized in rodents but remain elusive in primates, 
limiting our comprehension of the origins and 
dysfunctions of higher-level cognitive abil- 
ities. Moreover, in primates, including humans, 
a dearth of information remains on the early 


molecular events underlying the diversification 
of telencephalic regions and cortical areas, as 
NSCs traverse neuronal and glial trajectories. 


RESULTS: We dissected multiple regions of rhesus 
macaque prenatal telencephalon and conducted 
single-cell RNA sequencing on over 761,000 cells, 
from the phase before neurogenesis through to 
mid-gliogenesis. We distinguished transcripto- 
mically defined cell subtypes representing dorsal 
and ventral NSCs, excitatory and inhibitory 
neurons, glial cells, and non-neural cells and 
characterized their molecular dynamics across 
regions as they navigated lineage trajectories. 

We identified putative early progenitors of 
telencephalic organizing centers and predicted 


cription factors (TFs) that target fibroblast 
growth factor (FGF) signaling-related genes 
in the anteroventral domain, and their signaling 
cross-talk with regional NSCs. Transcriptomic 
comparisons with mouse revealed primate- 
biased expression of galanin-like peptide (GALP) 
in the anteroventral organizer. Furthermore, 
we observed that GALP ligand enhanced corti- 
cal NSC proliferation in human and monkey 
brain organoids and in injected mouse embryos. 

Cortical NSCs displayed regional diversifi- 
cation in early subtypes and in outer radial 
glial (RG) cells. Moreover, we identified region- 
specific gene expression cascades throughout 
the NSC state transitions, including retinoic acid 
(RA) signaling components in the frontal region. 
Along the excitatory neuronal lineage, regional 
variance became pronounced late during matu- 
ration, when synaptic connections are estab- 
lished and laminar neuronal identities refined. 
However, putative cell-autonomous mechanisms 
may also contribute to neuron regional diversity. 
This was evident in vitro, where cortical NSC- 
derived excitatory neurons expressed region- 
specific genes, including the prefrontal-enriched 
RA signaling-associated CBLN2, even in the 
absence of outside synaptic inputs. By contrast, 
interneurons appeared more transcriptomically 
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Regionalization of macaque telencephalon. Single-cell transcriptomics of prenatal macaque telencephalic 
regions distinguished subtypes spanning development. The transcriptome of putative organizers and their 
TF network were defined. Primate-expressed GALP modulates NSC proliferation in vitro and in vivo. 
Differentially expressed gene (DEG) analysis revealed regional divergence along NSC progression, 
neurogenesis, and gliogenesis. Expression of brain disease—-associated genes in organizers denotes risk 
during early development. 
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homogeneous across regions. Lastly, transcrip- . 
tomic divergence in astrocytes emerged during 
late differentiation, wherein they displayed 
region-specific competence in responding to 
vascular signals, including Notch ligands. 

Whereas previous studies have reported neuro- 
psychiatric disorder-associated genes that func- 
tion in prenatal and/or adult neuronal and glial 
cells, we found risk genes that are expressed in the 
patterning centers and during the progression 
of dorsal and ventral NSCs. Thus, these disor- 
ders may have an even earlier developmental , 
origin, implicating dysfunctional telencephalic 
patterning and altered spatiotemporal identity 
of the RG cells. 


CONCLUSION: We have elucidated the molecu- 
lar mechanisms underlying the progression of 
NSC identity and the regional specification of 
neurons and glia during macaque telencephalic 
development, highlighting early events as 
possible risk factors for brain disorders. This 
resource enables further exploration of pri- 
mate and human brain development, evolution, 
and diseases. 
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During early telencephalic development, intricate processes of regional patterning and neural stem cell 
(NSC) fate specification take place. However, our understanding of these processes in primates, 
including both conserved and species-specific features, remains limited. Here, we profiled 761,529 
single-cell transcriptomes from multiple regions of the prenatal macaque telencephalon. We deciphered 
the molecular programs of the early organizing centers and their cross-talk with NSCs, revealing 
primate-biased galanin-like peptide (GALP) signaling in the anteroventral telencephalon. Regional 
transcriptomic variations were observed along the frontotemporal axis during early stages of neocortical 
NSC progression and in neurons and astrocytes. Additionally, we found that genes associated with 
neuropsychiatric disorders and brain cancer risk might play critical roles in the early telencephalic 


organizers and during NSC progression. 


he development of the telencephalon 

entails a complex interplay of molecular 

processes that govern the specification of 

distinct regions by early organizing cen- 

ters and the commitment of radial glial 
(RG) cells, which function as neural stem cells 
(NSCs) (J-6). In the pallium, RG cells generate 
glutamatergic excitatory neurons, eventually 
becoming gliogenic. Whereas these mecha- 
nisms have been extensively characterized in 
rodents (7), they remain elusive in primates 
(8, 9), limiting our understanding of the ori- 
gins and dysfunctions of higher-order cogni- 
tive abilities. 

Substantial progress has been made in elu- 
cidating the gene networks that guide human 
and nonhuman primate brain development 
by bulk tissue (JO-J4) and single-cell genomic 
profiling (15-17). Nonetheless, the early molec- 
ular events governing the spatiotemporal pro- 
gression of NSCs and the diversification of 
telencephalic regions and cortical areas, as cells 
traverse neuronal and glial trajectories, are 
not fully understood. 
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We conducted single-cell RNA sequencing 
on more than 761,000 cells of multiple regions 
of prenatal rhesus monkey telencephalon, rang- 
ing from the early phases when the organizers 
pattern different regional anlage before neuro- 
genesis through to mid-gliogenesis. We performed 
comparison with mouse datasets, revealing 
primate-biased expression of GALP neuro- 
peptide in the anteroventral organizer, and 
evaluated its function in brain organoids. In 
addition, we defined the gene expression cas- 
cades underlying the early spatial divergence 
of the NSCs and the regional specification of 
cortical neurons and glia. Finally, we mapped the 
developmental expression of neuropsychiatric 
disorder- and brain cancer-associated genes, 
showing that risk genes might have putative 
early roles in the telencephalic organizers and 
across NSC progression. 


Spatiotemporal transcriptomic 
characterization of prenatal macaque 
telencephalic cells 


We conducted single-cell RNA sequencing 
(scRNA-seq) on 82 samples collected from 
multiple prospective regions of 12 prenatal 
rhesus macaque telencephalons, from embryon- 
ic day 37 (E37), prior to neurogenesis, until mid- 
gliogenesis, at E110 (/4) (Fig. 1A and fig. S1A). 
The ganglionic eminence (GE), anterior (A)/ 
frontal (FR), dorso-lateral (DL)/putative motor- 
somatosensory (MS), posterior (P, temporo- 
occipital)/occipital (OC), and putative temporal 
telencephalic walls were recognized at E37- 
E78. More-refined areas were distinguished at 
E93 and E110 from GE, prospective prefrontal 
(PFC), primary motor (MIC), parietal (Par), pri- 
mary visual (V1C), insula (Ins), and temporal 
(Tem) cortical walls (78). Stringent quality con- 
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trol resulted in 761,529 high-quality cells (fig. 
S1B and table S1). 

Unsupervised clustering and marker gene 
profiling identified 112 transcriptomically de- 
fined cell subtypes, including progenitors of pu- 
tative telencephalic patterning centers; dorsal 
and ventral NSCs traversing excitatory and in- 
hibitory neurogenic lineages, respectively; glio- 
genic lineages; and non-neural cells, which 
paralleled the sample spatiotemporal informa- 
tion on the uniform manifold approximation 
and projection (UMAP) layout of cells (Fig. 1B; 
fig. S2, A to E; and table S2). 

This resource depicting single-cell spatio- 
temporal transcriptomic dynamics of the de- 
veloping macaque telencephalon is accessible at 
http://resources.sestanlab.org/devmacaquebrain. 


Transcriptomic signatures of macaque 
telencephalic patterning centers 


We identified early domain-specific SOX2"/NES* 
neuroepithelial progenitors representing puta- 
tive telencephalic patterning centers (PCs, also 
called organizers) (3, 4, 19) (Fig. 2, A to C; fig. S3, 
A to C; and tables S3 and S4). Three cocluster- 
ing subtypes, detected in the anterior region, 
showed FGF8/17/18, SP8, FOXGI, NKX2-1, or 
SHH expression, representing putative anterior 
neural ridge/rostral patterning center (RPC) 
(PC FGFI7) and anteroventral (AV) progen- 
itors (AV NKX2-1/NKX6-2 and AV NKX2-1/ 
LMO1) (20, 21). Two NKX2-I" subtypes found 
in the GE coclustered with the AV progenitors, 
likely representing the organizer of the ventral 
forebrain (4, 19). ZIC genes (ZICI/3/4) were 
detected in these AV subtypes, forming a ven- 
trodorsal and anteroposterior gradient along 
with the AV patterning genes expressed in this 
domain. Two subtypes in the medioposterior do- 
main expressing LMXIA, WNT (RSPO3, WNTSB), 
and BMP (BAMBI) signaling members likely 
represent two states of the dorsocaudal orga- 
nizer, namely hem (PC RSPO3) and hem/choroid 
plexus epithelium (CPe, PC TTR) (22, 23), whose 
domain was further characterized by ARX, 
FGFR3, and LHX9 expression (Fig. 2, A to C, 
and figs. S3, B and C, and S5E). Other posterior 
subtypes included one putative zona limitans 
intrathalamica (ZLI) (PC TCF7L2) and an anti- 
hem (PC SFRP2) (24). These cells were tran- 
sient and barely detectable after E43 (Fig. 2A), 
and their identities were further validated by 
transcriptomic comparisons with a mouse (25) 
and a macaque dataset (/4) (fig. S3, D and E). 
Organizers secrete morphogens that induce 
gene expression gradients across the telenceph- 
alon (1, 4, 20, 23, 26). Integrating motif en- 
richment and gene coexpression, we predicted 
regulatory networks that connect transcription 
factors (TFs) with their putative target genes, 
including signaling components recruited across 
different domains (Fig. 2D and table S5). The RPC 
subtype putatively employed TF-target regula- 
tion involving ZIC genes upstream of fibroblast 
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Fig. 1. Cell atlas of macaque telencephalon. (A) Age scheme and representative macaque brains illustrating the regions and areas dissected. Scale bar, 1 cm. 
(B) From innermost to outermost circle: UMAP visualizing cell classes; subtype proportions; marker expression; region and age composition; cell classes; and 


subtypes. The region nomenclature is based on the temporal development of the telencephalon (bottom left). 


growth factor (FGF) pathway-related genes, 
including FGF3/8/17/18 and SPRY2, whereas 
AV and GE progenitors exhibited both over- 
lapping and subtype-specific regulation, includ- 
ing NKX2-1 linked to SHH through ZNF219. 
Likewise, posterior organizer subtypes showed 
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domain-specific regulation, including NFIX up- 
stream of Notch and FGF signaling genes, as 
well as domain-shared regulation, such as ARX 
and LHX9. Different elements of the same sig- 
naling pathways exhibited divergent spatial 
activation (Fig. 2D and fig. S4, A and B). For ex- 
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ample, the BMP signaling members RGMA and 
FSTLI were recruited anterior-ventrally, whereas 
LEFI was in the hem. Together, these data de- 
note the combinatorial interaction of TFs with 
signaling molecules, orchestrating the pattern- 
ing action of the organizer centers (J, 27). 
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Fig. 2. Molecular signatures of putative telencephalic organizers. (A and B) Organizer cell subtypes (A) and their markers (B). (©) RNAscope of macaque sagittal 
brain sections. Scale bar, 500 um (panoramic) and 200 um (zoom-in). A, anterior; D, dorsal; GE, ganglionic eminence; LV, lateral ventricle; P, posterior. (D) Predicted TF 


regulatory network with nodes colored by subtype (left) 


Finally, RNA velocity inferred the potential 
neuronal lineage of the AV and cortical hem pro- 
genitors, generating LHX8* and ONECUTI'/ 
ONECUT2* or TP73" Cajal-Retzius neurons, re- 
spectively (fig. S4C). Taken together, these data 
highlight the molecular events underlying pri- 
mate telencephalic organizer activities. 


Signaling interaction between telencephalic 
organizers and NSCs 


To define how the regional identity of the 
telencephalic RG cells is instructed by the 
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or signaling (right). 


organizers, we leveraged annotated signal- 
ing pathway-related ligand-receptor (L-R) 
pairs, inferring cell interactions between orga- 
nizer subtypes [RPC (FGFI17"), AV (NKX2-1°), 
hem (RSPO3"), and hem/CPe (TTR*)] express- 
ing the ligands and region-specific NSCs express- 
ing the receptors and clustering them into 
modules (M1 to M10) on the basis of cross-talk 
patterns (Fig. 3, A and B, and fig. S5, A to D). 
For example, M6 largely consisted of L-R pairs 
of FGF signaling and was characterized by 
FGFI8-FGFRI and FGFI8-FGFR3 pairs, pre- 
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dicting signaling from the RPC expressing 
FGF18 ligand toward anterior, posterior, and 
ventral NSCs expressing the receptor FGFRI1, 
or selectively posterior and ventral NSCs ex- 
pressing FGFR3. Other modules consisting 
of WNT (WNT5A-Frizzled receptor FZD5, M2; 
WNT5A-WNT signaling modulator PTPRK, M3) 
and BMP signaling-related L-R pairs predicted 
cross-talk between the cortical hem and region- 
specific NSCs. RNAscope supported the pre- 
dicted interactions, outlining spatial expres- 
sion patterns in the expected domains (Fig. 3B 
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Fig. 3. Signaling cross-talk between organizers and NSCs. (A) L-R pair 
modules (M) for selected signaling pathways (i) mediating organizer-NSC cross- 
talk at E37-43 (ii). (B) Directed L-R mediated interactions (i). RNAscope of 
macaque sagittal brain sections (ii). Scale bar, 500 um (panoramic) and 200 um 
(zoom-in). (C) Organizer markers enriched in macaque versus mouse (25). 


and fig. SSE). These data strongly suggest that 
brain organizers selectively signal to compe- 
tent NSCs (27). Moreover, the results show 
regional expression of morphogens and paired 
receptors in macaques, supporting the hypoth- 
esis of a signaling code integrating the cross-talk 
between organizers and region-specific NSCs. 


Putative primate-biased proliferation signaling 


Transcriptomic comparisons between macaque 
and mouse brain organizers (25) revealed sim- 
ilarities as well as notable differences between 
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DIV, day in vitro. 


these species (Fig. 3C and fig. S6, A and B). 
Genes enriched in macaque organizers in- 
cluded the neuropeptide galanin-like peptide 
(GALP), known to be involved in hypothalamic 
functions in adult rodents (28). In macaque, we 
found that GALP was expressed by the RPC 
progenitors and their putative progeny lineages. 
Moreover, its expression decreased after E43, 
indicating a transient function at early phases 
(Fig. 3C and fig. S6C). RNAscope analysis vali- 
dated the expression of GALP and its family- 
related Galanin (GAL) in the anteroventral 
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(D) RNAscope of macaque and mouse sagittal brain sections. Scale bars as in 
(B). (E) Immunohistochemistry of hCO exposed to FGF8 +/— GALP (left) and 
high-throughput quantification (4SD, right). Scale bar, 10O um. One-way analysis 
of variance, Dunnett's multiple comparison (**P < 0.01; ns, not significant). 


domain of E40 monkey telencephalon, whereas 
they were not detected in mouse at the equiv- 
alent ages E9.5 and EI11.5 (Fig. 3D). GALP/GAL 
receptor 2 (GALR2) was also evident in the mon- 
key but weaker in the mouse telencephalon. 
Human cortical (hCO) and medial GE (hMGEO) 
organoids were generated and further directed 
toward dorsocaudal or anteroventral identity, 
modulating, respectively, RSPO3 and FGFS sig- 
naling during the patterning phase (29). Stan- 
dard markers confirmed their regional bias; in 
addition, we observed higher expression of GAL, 
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GALP, ZIC4, and SP8 in hMGEO than in hCO, 
suggesting that these are intrinsic features of 
anteroventral neural cells (fig. S7, A and B). Fur- 
thermore, prolonged exposure of rhesus ma- 
caque cortical organoids (rmCO) to GALP, GAL, 
or both ligands increased NSC proliferation 
and compromised neuronal differentiation (fig. 
S7, C and D). 

Finally, exogenous GALP increased the num- 
ber of Ki67° and SOX2" cells, which indicate 
proliferation and stem cell identity, respectively, 
in hCO with anterior identity (hCO+FGEF8) ra- 
ther than in anteroventral hMGEO (hMGEO 
+FGF8) or dorsocaudal hCO (hCO+RSPO3) 
(Fig. 3E and fig. S7E). Moreover, in utero in- 
traventricular injection of GALP ligand in 
E11.5 mouse embryos, followed by 5-ethynyl 

'-deoxyuridine (EdU) incorporation, resulted 
in a higher proportion of EdU“, Sox2", and Ki67" 
cells in the rostral-medio dorsal telencephalon, 
but not in caudal nor ventral areas, relative to 
the phosphate-buffered saline (PBS)-injected 
controls, at E12.5 (fig. S8, A to C). Together, 
these data indicate that GALP preferentially 
induces proliferation of cortical RG cells with 
anterior identity. 


Transcriptomic divergence in NSC progression 
across cortical regions 


On the basis of marker gene expression, cor- 
tical NSCs were distinguished into multiple 
subtypes showing different regional propor- 
tion and whose appearance correlated with 
developmental ages: two neuroepithelial stem 
cell (NESC) subtypes, two early (VRG,) and 
one late (VRG,) ventricular RG cell subtype, 
two truncated RG (tRG) cell subtypes, one 
ependymal subtype, and two outer RG (oRG) 
subtypes (Fig. 4, A and B, and fig. S9, A and B). 
Pseudotime analysis further defined the pro- 
gression of the ventricular NSCs up to epen- 
dymal cells, distinguishing the oRG cell lineage 
(fig. SOC). Transcriptomic comparisons with 
human developing brain scRNA-seq datasets 
(15, 16) confirmed our annotation and identi- 
fied earlier NSC states across the telencephalic 
regions in our dataset (fig. S9D). Most of these 
subtypes were found in all four main regions 
analyzed; however, an early RG cell subtype 
(vRG; PMP22"), highly expressing CYP26A1 
and ZICI/3/4, was found selectively enriched 
in the anterior region (Fig. 4B). 

Within the region-shared subtypes, expres- 
sion changes detected along the progression of 
ventricular NSCs and oRG cells were, for exam- 
ple, in chromatin remodeling factors (HIMMGA2 
and JARID2), during the transition of the NESCs 
into vRG cells; in cilia-related genes (FOXJT) in 
the observed transition from tRG to ependymal 
cells; and in cell interaction (neurexins) and 
angiogenesis (VEGFA) genes, along the oRG 
cell development (fig. S9, E and F). Thus, this 
analysis determined the gene cascades under- 
lying key cortical NSC state transitions. 
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Analysis of differentially expressed genes 
(DEGs) between the regions and along the 
progression of the NSCs of the ventricular 
zone (VZ; also called ventricular or apical 
progenitors) and sub-ventricular zone (SVZ; 
also called basal progenitors) showed accen- 
tuated area diversification in the early NSCs 
(NESCs and vRG,) and in the late oRG cells 
(Fig. 4C). Within the ventricular NSC progres- 
sion, regionally enriched gene expression cas- 
cades were prominent at early phases of the 
anterior/frontal cells and included TFs such 
as ZNF219, ZICI/2/3/4/5 and SOX21, WNT mem- 
bers (WNT7B, WNTS8B), and retinoic acid (RA) 
signaling components (CYP26A/ and RBP1I) 
(Fig. 4D, fig. SIOA, and table S6). Some oRG 
marker genes (HOPX, PTN, EABP7, and PMP22) 
were enriched in these NSCs, representing dis- 
tinctive features of this early anterior population; 
however, their expression became regionally 
comparable in more mature states (fig. S10, B 
and C). Both early posterior/occipital and tem- 
poral regions displayed higher expression of 
NR2F1/2, FGFR3, WNT (RSPO3), and Notch 
(HESS) signaling members (Fig. 4D and figs. 
S3C, S5E, and S10A). 

We also found expression enrichment of 130 
genes shared by early anterior/frontal and 
posterior/occipital NSCs and 16 other genes, 
including the neuropeptide PENK, whose ex- 
pression shifts from early anterior/frontal to 
late occipital NSCs (fig. S10, D to G, and table $7). 

Along the oRG cell lineage, DEGs across 
regions included RBP1I in the frontal; MEF2C 
and NPY neuropeptide in the occipital; and the 
BDNF receptor N7RK2 in the temporal (fig. 
S10, H and I, and table S8). Together, these re- 
sults define temporally regulated region-specific 
gene expression patterns along apical and basal 
NSC progression. 

Finally, vVRG,, rather than early NSCs (NESC 
and vRG,) showed positive region-identity cor- 
relation with oRG cells, even more pronounced 
frontally, with few genes (for example, RBPI, 
ZICI, and DCT) consistently expressed in all 
frontal NSC subtypes (Fig. 4E). This suggests 
that region-specific molecular programs of 
apical NSCs differ during early phases from, 
but are later relayed to, oRG cells. However, all 
the frontal apical NSCs and oRG cells might 
share molecular mechanisms, including RA 
signaling response. 

RNAscope of monkey tissue validated the 
expression pattern of several TFs at E40, E52, 
and E76, such as ZIC4, SP8, NKX2-1, LHX9, 
FEZFI1, and NR2F1, which decreased over time 
in their respective regions, and others, such as 
ZICI/3, MEIS2, and PBX1, whose expression 
increased, spreading from the anteroventral 
to the anteroposterior axis (Figs. 2C and 4F 
and fig. S3C). Thus, the early-generated gra- 
dients have been found to be transient, changing 
their spatial expression throughout development 
(4). In conclusion, primate telencephalic regions 
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involve a code of sequentially regulated genes, 
from NSCs throughout their progression along 
defined region-specific state transitions. 


Transcriptomic diversification of the 
excitatory neurons across prospective 
cortical regions and areas 


Unsupervised clustering and marker profiling 
identified distinct subtypes along the excitatory 
neuronal lineages, from HOMES” intermediate 
precursor cells (IPCs) to deep-layer (DL, SOX5") 
and upper-layer (UL, CUX2") excitatory neu- 
rons (Fig. 5A and fig. S11, A and B). Integration 
with an adult macaque PFC dataset (30) fur- 
ther depicted these trajectories, predicting ma- 
ture identities of fetal neurons (Fig. 5B and fig. 
S11, A, C, and D). DL neurons emerged at E37- 
43 and peaked at E54-64, promptly diversify- 
ing into subplate (L6B, from NR4A2"/GRID2"), 
corticothalamic (L6CT, from SYT6"), and intra- 
telencephalic (L6IT-1 and L6IT-2, from OPRKT") 
subtypes. The UL lineage was evident at E77-78 
and enriched at E93; however, its diversifica- 
tion into adult cell types was not clear, sug- 
gesting that additional time was required for 
their maturation. Neurogenesis dynamics also 
varied across cortical regions. For instance, 
occipital UL neurons emerged later than other 
regions, yet displayed faster maturation (fig. 
S12, A and B). These analyses define the cel- 
lular dynamics underlying the laminar orga- 
nization of the different neocortical areas in 
primates. 

Differential expression and AUC (area under 
the curve) score analyses along the excitatory 
neuron pseudotime and hierarchical clustering 
highlighted increasing regional and area di- 
vergence at late differentiation phase, identify- 
ing more region-specific genes in the frontal 
cortex (Fig. 5, C and D, and fig. S12, C and D). 
Genes enriched in frontal DL neurons included 
protocadherins (PCDH10/17), whereas area di- 
vergence in UL neurons was defined by RA sig- 
naling members (CYP26AI, CBLN2, and MEIS2) 
in PFC and BCL6 in MIC, as previously reported 
(10, 31) (Fig. 5D). However, non-negligible 
regional differences were also detected in IPCs, 
likely representing earlier cell-autonomous 
events seeding neuron diversification. Further- 
more, whereas region-specific signatures largely 
overlapped between DL neuron subtypes, they 
were scarcely shared between DL and UL neu- 
rons, suggesting that distinct molecular pro- 
grams govern the establishment of regional 
identities in inner versus outer cortical layers 
(fig. SIZE). RNAscope on macaque brains fur- 
ther confirmed divergent expression patterns 
across regions, outlining the frontal-caudal 
gradient of BCL6 and CBLN2 (Fig. 5E). Finally, 
transcriptomic comparisons with age-matched 
prenatal human excitatory neurons (J6) showed 
human-biased signatures, including RBPI, which 
was more prominent in the prefrontal cortex 
(fig. S12, F and G). These analyses delineate 


5 of 21 


RESEARCH | BRAIN CELL CENSUS 


A J seudotime B 
Ependymal yA : NESC (DIRAS3*) I am 4AHAHAAHAHAAHAA HAA AG 4 
a" Nesc (7EX15*) | tS a a ei 5 3 
. RG (FOCI) ve (conor) (MMMM wma ttt ttt at Let 
2c apors) VRC, (PMP22") I hth 0h 4 4.0414 44 HOE i 
10- RG (STMN2) Mw | 1411} ot 1 
| BORG (TNC+) oRG (TNC) aoe 7 | AAA AHH FH aA -100-0-4- 1 4 
o as oRG (APOE’) |_| 4+ -+{ HH H+ 10-014 0-0 
g aan RG MEST) IM cme) | A111 11-1 1-10-24 010-4 
a (eure) re rnocr) MBL {4-1 4H 4 1-11 4 40-4 
WRG © ependymal Mo -t-44-444 4 1144-4 {10-4 
? 7 0 1 BYHORO— 
25 PPMP NIESC (DIRAS3+)/ 22 BBB OM HP AO GP arr POG gO BPO MV oh SS SO 
NESC (TEX15#) Vo S suns A nehcea) jpmeoncn: 2 ek GE Coe OE a Sores ES ES LE 
0. oe & Bl tenvtemPoie Gestation 9 = ORG markers 
10 8 6 4 2 a insula Age (E) vRG, (PMP22+) markers 
UMAP'1 
C D 
(iii) ORG cell progression Ennenipent 
(i . 
Time CP 8 
oRGs D 
oO 7 CYP26A1 
osvz W L LOxG? 
6 JS-Wzic1 
a oe | Lhiliee 
74 yi > : = EIN 
@0-0-0°Q 4) | Teasp 
Lv NESC vRG, vRG, tRG Pee BOX 
i | : Tem Faby 
ORG i aaa es 13 15 MS ele 
(ii) apical NSC progression ORG wopruaroer - Region Aa —_ 
EMXx2) — 
9 LAMTORS) ~. 
3 RAB7A| 7 
= DKK3_— 
wn 
oO 7 8 RSPO3 
ta J FGETS 
a 6 oe 7 oe IGF 
= 5 HI ai 
6 =| R2F2 ‘ 
= | Bo 
2, ~<ENM219 
PEGI 
NR2F7 1X 
> TF 
RSPO3IS Pseudotime ~ Not TF 
Scaled Expression Region - ees we 
ne 
PO 2 3 Rvs nunennicssic) Riven renrrersrc) MvRe, [tRG’ 
E FR MS Oc ZIC4 NKX2-1 DAPI ZIC3 ZIC1 LHX9 DAPI 
r=0.74 DCT ' r=0.52 | r=0.23 
al ae SPHK: , 
* o%» .RBP1 ; c 
15 oes . re : 
PPPSREB HESS o. y oo 
? | ¢ = TH 


| 
= 
L 


De) 
L 


= 
1 


oO 
L 


| 
= 
L 


1 0 14 2 4 0 14 2 + 0 14 2 


NESC @ vRG, @ oRG 
vRG. Subtype 


Regional enrichment in NESC & vVRG, Regional enrichment in vRG, 


Regional enrichment in ORG 


Fig. 4. Transcriptomic variation of NSC progression across cortical 
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the developmental dynamics underlying re- 
gional diversification of cortical excitatory neu- 
rons, highlighting identity refinement during 
their maturation. 
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Macaque 


Next, we asked whether the protomap of the 
RG cells was related to the area specificity of 
the excitatory neurons. By intersecting region- 
specific genes of early RG cells and neurons, 


Dorsal 
Anterior Posterior 
Ventral 


(ili) progression. (D) Region-specific gene cascades along the ventricular NSC 
progression. (E) Region specificity correlation between early (NESCs and vRGe) 
or late (VRG_) NSCs versus oRG cells. Colors denote the subtypes showing 
region enrichment. (F) RNAscope of macaque brain sections. Scale bar, 500 um 
(panoramic) and 200 um (zoom-in). Sep, septum; Th, thalamus. 


we found the genes expressed throughout the 
whole lineage progression. These included HOPX, 
CYP26A1, and RBPI in frontal, NR2F1 and 
RSPO3 in occipital, and NR2F2 and CYP26B1 
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trajectories. (D) Region-specific gene cascades along neurogenesis (i and ii). 
Region- and area-specific genes in late excitatory neurons (ili and iv). (E) RNAscope 
in macaque brain sections. Scale bar, 500 wm (panoramic) and 200 um (zoom-in). 
(F) Shared region-specific genes between RG cells and excitatory neurons. 


Fig. 5. Spatiotemporal transcriptomic divergence of cortical neurogenesis. 
(A) UMAP showing IPCs generating excitatory neurons. (B) Transcriptomic 

integration of E54-110 and adult macaque PFC excitatory neurons (30). (C) Scheme 
of neurogenesis (i). Number of regional DEGs along the DL (ii) and UL (iii) neuron 


in temporal regions, likely defining neural cells’ | cal NSCs differentiated in vitro (29), identified 


regional identity across development (Fig. 5F 
and fig. S13A). In addition, bulk RNA-seq, 
across neurogenesis of monkey regional corti- 
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many region-specific genes expressed in neu- 
rons that were also present in the excitatory 
neurons in vivo, including RA signaling com- 
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ponents (RBPI, CYP26A1, BRINPI, and CBLN2) 
and synaptic genes (LFRN2 and CAMKY) (fig. 
S13, B to E). Because these in vitro neurons lack 
connectivity inputs from other brain regions, 
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these identified region-specific genes might re- 
flect the intrinsic events underlying neuronal 
regional divergence, and were more prominent 
in frontal than occipital cortex. Altogether these 
data denote the potential contribution of early 
and late cell-autonomous mechanisms in neu- 
ronal diversity across neocortical areas. 

In contrast to the divergence of the excita- 
tory neurons, the inhibitory neurons, which 
were distinguished in subtypes distributed across 
regions and ages, such as MGE (LHX6")- and 
CGE (NR2F2* and/or SP8")-derived subtypes, 
showed limited cortical area differences (fig. 
S14, A to D). The area-specific genes expressed 
in LHX6" interneurons were paralleled in 
NR2F2*/SP8* subtypes and vice versa (fig. S14, 
E and P), indicating transcriptional overlapping 
of area identities between interneurons and sug- 
gesting that later cues might eventually con- 
tribute to their further diversification in the 
cortex (32). 


Spatial transcriptomic divergence 
across gliogenesis 


We next focused on gliogenesis trajectories. 
Through unsupervised clustering, cell trajec- 
tory reconstruction inference, and pseudo- 
time analysis, we distinguished late RG cell 
subtypes, which include vRG,, tRG, and oRG 
cells, transitioning into excitatory neurons or 
EGFR" ®"-expressing glial intermediate pre- 
cursor cells (gIPCs) (33), which diverge toward 
astrocytes or oligodendrocytes (Fig. 6, A and B, 
and fig. S15A). Comparative analyses with 
multiple fetal and adult human, macaque, and 
mouse datasets confirmed our annotation and 
distinguished astrocytes in putative interlami- 
nar (GFAP’) and protoplasmic (VFGES8* and 
EGFR’ ) subtypes (fig. S15, B and C), suggesting 
that astrocytic adult identities emerge during 
mid-fetal stages in primates. 

To define the transcriptomic programs under- 
lying the switch of RG cells from neurogenic to 
gliogenic potential, we identified the top-ranked 
genes expressed at each lineage branch (Fig. 6C, 
fig. S15D, and table S9). Known regulators of 
astrocytic fate such as the chromatin remodel- 
ing factors HMGN3 and HMGB2 and TFs such 
as PAX6, HESI, and SOX2/3/6 were expressed 
by late RG cells; OLIGI/2 and ASCLI, by gIPCs; 
and SOXI0 and NKX2-2, by oligodendrocytes, 
whereas astrocytes expressed many RG genes. 
These data reveal a temporally orchestrated 
combination of genes at the divergence from 
neurogenesis to gliogenesis during monkey 
corticogenesis (34, 35). 

DEG analysis indicated that glial cells diverge 
across areas less than excitatory neurons do. 
However, astrocytes displayed higher number 
of DEGs, including TFs, than did gIPCs and 
oligodendrocytes at E110, and expressed area- 
specific genes, such as SLC35F/ in prefrontal 
and STAT3 in temporal cortex, which denoted 
their distinct spatial molecular features (Fig. 6, 
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D and E, and fig. S15E). Together, these results 
indicate that transcriptomic variation of the 
astrocytes across cortical areas is more accen- 
tuated during late differentiation. 

L-R pair analysis among neural and non- 
neural cells predicted the highest number of 
putative interactions, with the most prominent 
area variation between endothelial cells and 
astrocyte IPCs or astrocytes (Fig. 6F). Endo- 
thelial cells showed low transcriptional varia- 
tion among the areas (fig. SI5F), suggesting 
that astrocytes might respond differently to 
their signals. We identified L-R pairs of di- 
verse signaling pathways displaying variation 
across time and areas (fig. S15G). Notch and 
Midkine (MDK) signaling-related L-R pairs 
displayed similar expression of the ligands 
(JAGI/2, DLL4, and MDK, respectively) in 
endothelial cells, in contrast to the variation 
of the receptors found in the astrocytes 
(NOTCH1/2/3 and SDC2 and ITGBI, respectively) 
(Fig. 6G). These data suggest that astrocytes have 
area-specific competence to respond to the en- 
dothelial cell signals, which might contribute 
to their transcriptomic variation across areas. 


Spatiotemporal expression of disease-risk 
genes in early telencephalic development 


Alterations of cortical development are impli- 
cated in neuropsychiatric diseases (36); how- 
ever, little is known about the function of the 
risk-associated genes in the NSCs of the early 
telencephalon. We curated gene lists associ- 
ated with major neuropsychiatric and neuro- 
degenerative disorders, as well as brain cancers 
(table S10). Expression enrichment analysis 
showed that risk genes for multiple neuro- 
psychiatric disorders were prominent in ex- 
citatory and inhibitory neurons, whereas 
glioblastoma-associated genes were highly 
expressed in oRG and glial cell precursors, as 
previously reported (J8, 37-40) (Fig. 7A and 
fig. SI6A). 

This analysis unveils the most salient signals 
in neurons; however, it might mask the gene 
expression patterns in brain organizer progeni- 
tors and RG cells, whose dysfunctions were sug- 
gested as early fetal risks for neuropsychiatric 
diseases (47). Although no enrichment of any 
disease gene set was observed in organizer 
domain subtypes (Fig. 7A), 26.1 to 36.2% of the 
genes from each list were expressed in these 
cells (in =10% cells) (fig. SI6B). Among these 
risk genes, 132 overlapped with patterning 
center subtype markers, of which 26 were 
distinctive of these organizer domains and 
the other 106 also appear later in development 
in other cells (Fig. 7B and fig. S16C). The 26 
genes included well-known patterning regu- 
lators, such as FGF8 expressed in the RPC, and 
other genes, for example, the autism spectrum 
disorder (ASD)- and glioma-associated gene 
MET, previously detected in late prenatal and 
postnatal human excitatory neurons (13, 30), 
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that was now found expressed early selectively 
in the cortical hem (Fig. 7B). Many of these 
genes exhibited no or limited expression in 
late gestation and postnatal human cortex, fur- 
ther suggesting a function restricted to early 
developmental phases (fig. S17, A and B). These 
analyses indicate that certain risk genes are 
expressed earlier in the telencephalic organi- 
zers, with a putative role in regional patterning 
and NSC specification. 

Similar analysis conducted for the NSCs 
showed enrichment of gene sets, including the 
multiple ASD- or glioblastoma-associated genes, 
in both dorsal and ventral late RG cells (tRG 
and/or oRG cells). However, other risk genes, 
associated with diseases including attention- 
deficit/hyperactivity disorder (ADHD), Tourette 
syndrome (TS), schizophrenia (SCZ), and bipolar 
disorder (BD), exhibited expression in early 
dorsal and/or ventral NSCs (Fig. 7C and fig. S18, 
A and B). Thus, neurodevelopmental disorder- 
associated genes might function throughout 
the progression of dorsal and ventral telen- 
cephalic NSCs. 

Intersecting disease-associated genes with 
top regionally enriched subtype markers re- 


vealed spatial- and cell type-expression bias of , 


the risk genes across primate telencephalic 
development (Fig. 7D and fig. S19A). For ex- 
ample, the ASD-associated gene CDON was 
preferentially expressed by anterior/frontal 
NSCs and SATBI by frontal DL neurons (37), 
whereas THSBI was preferentially expressed 
by occipital UL neurons. Similarly, glioblas- 
toma risk genes such as HEY] and HESI were 
enriched in anterior/frontal NSCs or posterior/ 
occipital NSCs, respectively. Thus, neuropsychi- 
atric disorders and cancers might have region- 
specific patterns of risk. 

In conclusion, these data point to telencephalic 
regional patterning and NSC progression as 
possible risk events for the origins of neuro- 
developmental disorders. Moreover, the mo- 
lecular programs underlying these NSC events 
might be dysfunctionally recapitulated in brain 
cancers (42). 


Discussion 


This work reveals the dynamic transcriptomic 
programs and cellular events underlying the es- 
tablishment of the telencephalic regional identity 
throughout macaque fetal brain development. 

NSCs determine the layout of the cerebral 
cortex at the earliest stages of fetal develop- 
ment by a coordinated expression of genes 
and enhancers’ activity in the protomap (J, 6). 
We characterized the transcriptome of the pri- 
mate telencephalic organizers and the regu- 
latory networks orchestrating their patterning 
function in discrete domains. Early regional- 
ized morphogens likely induce region-specific 
gene expression cascades in competent NSCs 
that traverse intrinsic spatiotemporal state 
transitions. 
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Fig. 6. Transcriptomic versatility of gliogenesis across cortical areas. 
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Differences were observed between mouse 
and monkey organizers. GALP and GAL are 
expressed in the anteroventral domain of the 
early monkey telencephalon but are not de- 
tected in the mouse at equivalent ages. As 
these neuropeptides are associated with more 


mature cortico-hypothalamic circuitries (43), 
our data suggest a role for both ligands as 
modulators of proliferation and likely differen- 
tiation of early fetal NSCs in primates. However, 
GALP enhances proliferation preferentially in 
frontal cortical RG cells, pointing to potential 
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mechanisms underlying patterning and evo- 
lutionary expansion of the primate cortex. 
Our data highlight two waves of spatial di- 
versification of the NSCs, occurring in early 
ventricular NSCs and in late oRG cells. Neurons 
and astrocytes exhibit high regional divergence 
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subtype expressed in no more than three other subtypes, excluding gene markers. 


in their terminal differentiation phase, likely 
influenced by synaptic inputs or signaling from 
the vasculature (44, 45). However, we identified 
region-specific genes expressed from NSCs 
throughout their differentiation into neurons, 


as well as genes expressed in neurons despite 
the lack of brain area connections, as seen in vitro 
for the prefrontal-enriched RA signaling- 
associated CBLN2, suggesting that cell-intrinsic 
mechanisms might contribute to defining re- 
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non-neural cells. Gene-disease association on the left. (©) Expression 
enrichment cascades of disease genesets along the temporal progression of 
dorsal and ventral NSCs. (D) Top regionally enriched (dot colors) disease 


gional neuronal diversity. Thus, these data 
support a model in which cell-autonomous 
programs characterize the early specification 
of the NSCs and their spontaneous progres- 
sion, and then an interplay of intrinsic and 
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extrinsic cues might further shape the iden- 
tity of neurons and astrocytes later during 
corticogenesis (46). 

Several studies point to neuronal and glial 
dysfunction during mid-late corticogenesis 
and postnatal ages at the origins of many 
neuropsychiatric disorders (18, 37, 38). How- 
ever, we found risk genes that were expressed 
in brain organizers and/or in dorsal and ven- 
tral NSCs, suggesting a potential earlier devel- 
opmental origin for these disorders, which 
implicates dysfunctional patterning of the 
telencephalon and altered spatiotemporal iden- 
tity of the RG cells along their neurogenic or 
gliogenic progression. We found that glioblas- 
toma and neurodevelopmental disorders, such 
as ASD and ADHD, share risk genes potentially 
implicated in brain organizer and NSC func- 
tions, suggesting that genetic lesions causing 
both diseases might converge to alter similar 
NSC gene networks (47, 48). 

The absence of epigenomic data paralleling 
our transcriptomic analysis limits the under- 
standing of gene expression regulation across 
regions and species. However, integrating these 
data with other datasets, as we have shown, 
will help us to better understand primate and 
human brain formation, evolution, and diseases 
and even improve cellular systems for modeling 
neurogenesis and its disorders in vitro. 


Materials and methods 


All procedures involving animals, including 
monkeys and mice, were carried out according 
to guidelines described in the Guide for the 
Care and Use of Laboratory Animals, and were 
approved by the Yale University Institutional 
Animal Care and Use Committee (LACUC). 


Caesarean sections of the pregnant monkeys 
and collection of the fetal brains 


Rhesus macaque monkeys were bred in Rakic 
and Sestan primate breeding colony at Yale. 
Timed pregnant monkeys were subjected to 
caesarian section at the required gestational 
age, performed by Yale’s Veterinary Clinical 
Services (VCS). Monkeys were first sedated 
with ketamine (3 mg/kg) and atropine sulfate 
(Lily, 0.2 mg/kg). A butterfly catheter was 
introduced into the saphenous vein for con- 
tinuous administration of fluids to prevent 
dehydration. Intravenous leads were secured 
subcutaneously for monitoring heart rate and 
respiration throughout the surgical procedure 
performed under isoflurane anesthesia and 
strict sterile procedures. A midline incision was 
made in the abdominal wall and the uterus 
gently exposed through the opening. The uterus 
was then incised between the primary and 
secondary lobes of the discoid placenta, the 
chorioallantoic membrane was punctured 
and the fetus delivered, decapitated still under 
the effects of anesthesia from maternal blood. 
The head of the fetus was transported to an 
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adjacent room housing a BSL2 hood, brain 
tissue was dissected for single cell transcrip- 
tomics or fixed in paraformaldehyde (PFA). 
After delivery, the uterine and abdominal 
walls of the mother monkey was sutured in 
layers. Post-operatively, the animals were mo- 
nitored several times a day until full recovery. 


Fetal monkey brain single-cell dissociation 


Fetal macaque brains were isolated from E37 
to E110, put on a dish containing PBS and 
sectioned. Telencephalic regions were identi- 
fied and dissected using a blade. For E37-43 
brains, the entire anterior and medio-posterior 
parts of the hemispheres were cut. Each tissue 
was incubated with HBSS-Papain (2 mg/ml, 
BrainBits, PAP) from 15 (for the early ages) 
to 30 (for more advanced ages) min at 370C. 
The solution was removed and the tissue gently 
triturated in HBSS-DNase I 0.1mg/ml solution 
(STEMCELL Technologies, 07900) using a 2 ml 
pipette. Then, samples were filtered through 
40 um cell strainers (Falcon, 352340) and the 
cells counted with an automatic cell counter 
(ThermoFisher Scientific). Samples were diluted 
in HBSS to 1000 cells/microliter and processed 
for single cell RNAseq analysis within 20 min 
at Yale Center for Genome Analysis (YCGA) 
core facility. 


Dissection of the mice 


Pregnant mice were delivered from Charles 
River to the animal facility at Yale. Animals 
were euthanized using a CO, chamber. The em- 
bryos were harvested, decapitated, and col- 
lected in PBS. 


Fixation and sectioning of the brain tissue 


Macaque and C57/BL6 mouse fetal brains were 
dissected and immerse in 4% PFA overnight 
(ON) at 4°C. Fixed brain blocks were immersed 
in step-gradients of sucrose/PBS up to 30% for 
2-3 days at 4°C, then embedded in OCT and 
frozen at -80°C. Sagittal sections were cut 25 um 
for the monkey brain tissue. Sagittal or coronal 
section were cut 15 um for mouse brain tissue. 
Sections were prepared using a Leica CM3050S 
cryostat and stored at -80°C until use. 


Construction of 10X genomic Single Cell 3’ 
RNA-Seg libraries and sequencing 


Sample Preparation. The first step for the 
construction of scRNA-Seq library involved 
the preparation of the single cell suspension 
described above. GEM Generation and Bar- 
coding. Single cell suspension in RT Master 
Mix is loaded on the Single Cell A Chip and 
partition with a pool of about 750,000 bar- 
coded gel beads to form nanoliter-scale Gel 
Beads-In-Emulsions (GEMs). Each gel bead 
has primers containing (i) an IlluminaR R1 
sequence (read 1 sequencing primer), (ii) a 16 
nt 10x Barcode, (iii) a 12 nt Unique Molecular 
Identifier (UMI), and (iv) a poly-dT primer se- 


13 October 2023 


quence (30 nt). Upon dissolution of the Gel 
Beads in a GEM, the primers are released and 
mixed with cell lysate and Master Mix. Incu- 
bation of the GEMs then produces barcoded, 
full-length cDNA from poly-adenylated mRNA. 
Post GEM-RT Cleanup, cDNA Amplification 
and library construction. Silane magnetic beads 
are used to remove leftover biochemical rea- 
gents and primers from the post GEM reaction 
mixture. Full-length, barcoded cDNA is then 
amplified by PCR to generate sufficient mass 
for library construction. Enzymatic Fragmen- 
tation and Size Selection are used to opti- 
mize the cDNA amplicon size prior to library 
construction. R1 (read 1 primer sequence) are 
added to the molecules during GEM incuba- 
tion. P5, P7, a sample index, and R2 (read 2 
primer sequence) are added during library 
construction via End Repair, A-tailing, Adaptor 
Ligation, and PCR. The final libraries contain 
the P5 and P7 primers used in Illumina bridge 
amplification. Sequencing libraries. The Single 
Cell 3’ Protocol produces Illumina-ready se- 
quencing libraries. A Single Cell 3’ Library 
comprises standard Illumina paired-end con- 
structs which begin and end with P5 and P7. 
The Single Cell 3’ 16 bp 10x Barcode and 12 bp 
UMI are encoded in Read 1, while Read 2 is 
used to sequence the cDNA fragment (91bp). 
Sequencing a Single Cell 3’ Library produces 
a standard Illumina BCL data output folder. 
The BCL data will include the paired-end 
Read 1 (containing the 16 bp 10x™ Barcode 
and 12 bp UMI) and Read 2 and the sample 
index in the 17 index read. Minimum sequenc- 
ing depth is 20,000 read pairs per cell. 


Single-cell RNA-seq data processing 
and filtering 


Cellranger was applied to align the scRNA-seq 
reads to rhesus macaque genome assembly 
Mmull0 together with the gene annotation 
file from NCBI RefSeq (release 103), followed 
by barcode counting and unique molecular 
identifier (UMI) quantification. The resulted 
filtered gene by cell UMI count matrices were 
used for additional quality control and filter- 
ing. An initial clustering was performed for 
each sample using Seurat (49) to spot poten- 
tial lowquality cell clusters, which include cells 
with low number of UMIs and/or high per- 
centage of mitochondria UMIs. The resulted 
count matrices were used in scrublet package 
(50) to predict the doublet score of each cell. 
Cell clusters with high doublet scores and ex- 
hibiting combinatory expression of two dif- 
ferent cell type markers were considered as 
doublet clusters and removed. Because sam- 
ples might behave differently, a sample-wise 
doublet score threshold were selected for each 
sample. To further remove such outlier cells, cells 
belonging to the same cell class across differ- 
ent batches were clustered together for addi- 
tional rounds of quality control, which increased 
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the power in detection of outliers. The filtered 
gene by cell UMI count matrices were used in 
the downstream analysis. 


Normalization, clustering, and dimension 
reduction of the scRNA-seq data 


Filtered UMI counts in each cell was first log- 
normalized using the NormalizeData function 
in Seurat (49), with the scaling factor set as 
10,000. To embed all cells across different 
development ages and brain regions in the 
same reduced dimension space, we applied 
fastMNN (57) and Harmony (52) to integrate 
the data. Here, both methods perform batch 
correction in the reduced dimensions and they 
overall exhibited very similar results. However, 
Harmony showed slightly better performance 
in preserving inter-cell heterogeneity and 
fastMNN were marginally better in recapitu- 
lation of cell differentiation lineages. And 
accordingly, Harmony was largely used for 
cell cluster identification and fastMNN for 
lineage visualization (for example, Fig. 1). Prior 
to batch correction, for each batch we identi- 
fied the highly variable genes using the variance- 
stabilizing transformation implemented in the 
Seurat package. Because the default selection 
of the highly variable genes with the highest 
frequencies across all samples might lose some 
key signals in certain developmental ages with 
smaller sample sizes, we parcellated the de- 
velopmental stages to three windows based on 
their transcriptomic similarity (E37-E43, E54- 
E78, E93-E110) and integrated the highly 
variable genes using the SelectIntegration- 
Features function in Seurat. The union of the 
genes across the three developmental periods 
were chosen for downstream analysis. Then, 
the normalized data were scaled separately in 
each batch (for Harmony), or together across 
all batches (for fastMNN), and used for prin- 
cipal components analysis (PCA), with elbow 
plots to select the significant principal com- 
ponents. The first 30 integrated reduced di- 
mensions were used for UMAP visualization 
with the “umap-learn” method and the “cor- 
relation” metric. Cell clustering was performed 
via “Louvain” algorithm based on the first 30 
integrated reduced dimensions with the k- 
nearest neighbor set to 25. Dividing cells from 
different cell types (for example, early and late 
RG cells) might cluster together by cell cycle 
phases rather than their identities. There- 
fore, we categorized cell clusters into different 
cycling phases based on the expression of key 
cycling genes (for example, S phase - PCNA 
and MCM5; G2M phase - MKI67 and TOP2A) 
and gene set enrichment of cell cycling genes 
calculated via the Seurat CellCycleScoring 
function, and further sub-clustered cells within 
each phase to maximize the variance contri- 
bution from cell type heterogeneity rather 
than cell cycling differences. This method 
showed better performance than the tradi- 
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tional method that clusters cells using the 
data with cell cycle scores regressed. 


Cell subtype annotation 


Patterning center subtypes were identified 
based on several criteria: (i) temporal enrich- 
ment at E37 and E42-43; (ii) high expression of 
canonical genes expressed in neuroepithelial 
stem cells such as SOX2 and NES, and low 
expression IPC makers (for example, ASCL1, 
NEUROGI and EOMES),; (iii) combinatory ex- 
pression of known patterning center markers: 
rostral patterning center (FGF8*, FGFI7"*) 
(20, 21), cortical hem and its following state 
hem-choroid plexus epithelium (RSPO3", TTR") 
(22, 23), antihem (SFRP2*, PAX6") (53), and 
zona limitans ([RXT", IRX2*, IRX3*, WNT3", 
WNT4") (24); and (iv) the spatial locations 
are consistent with the predicted identities. 
Furthermore, we spotted several early subtypes 
enriched in E37-43 and coclustered with pat- 
terning center subtypes, which likely represent 
regional specialized domains. These included 
two NKX2-1" subtypes (LMOI" and NKX6-2", 
respectively) detected in anterior samples re- 
presenting anteroventral (AV) domain cells. 
Both subtypes were transcriptomically sim- 
ilar to the two NKX2-I" early GE subtypes (GE 
RG NKX2-1 OLIGI and GE RG NKX2-1 OLIGI) 
and all the four subtypes coclustered with the 
rostral patterning center (FGF17 '; Fig. 2). We 
also identified subtypes forming continuous 
manifold and connecting with the patterning 
center subtypes and those domain-specific sub- 
types, resembling cell differentiation lineages. 
These contained the anteroventral domain 
NKX2-I'/LMOTI' subtypes give rise to the DLX1* 
IPC subtype (inIPC ASCLI DLX1) which pro- 
duces GNRHT' interneurons and LHX8"/ ZICI* 
interneurons. The rostral patterning sub- 
types seemed to form lineages with IPC FGFI7 
subtype and neuron subtype Neu TAGLN3 
ONECUT2. The zona limitans subtype also led 
to a lineage with IPC TCF7L2 subtype. In ad- 
dition, one NKX2-I* subtype (RAX”) in pos- 
terior domain (Pos), and one SFRPI" subtype 
with limited markers present in anterior do- 
main (Ant) were also dectected. A few other 
early subtypes, including Cls FGFI7 LGI, Cls 
LHX9 EBFI, Cls RSPO3 SOX], Cls GSX2 B3GAT2, 
were all small in sizes and their identities were 
left as unknown. Neural stem cells in dorsal 
regions clustered together and were defined 
based on the expression profiles of PAX6"/ 
SOX2"/NES*/EOMES /EGFR ~. These cells formed 
a circular shape on the UMAP layout rep- 
resenting cell cycling states and they were also 
organized along the temporal axis resembling 
the progression of their identities. This started 
with putative neuroepithelial stem cells, which 
were identified based on the high expression 
of RSPO3 and relatively lower expression of 
PAX6. Two early RG cell subtypes were defined 
at E37-43, one expressing FABP7 and PMP22, 
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enriched at anterior regions, and the other 
one labeled by HMGA2 and CCNDI1 ubiqui- 
tously present in all regions analyzed. Late 
RG cells included two oRG cell (HOPX'/NRGI') 
subtypes, two tRG cell (CRYAB") subtypes, and 
one vRG cell subtype directly connecting with 
early RG cells and showing low expression of ORG 
and tRG markers. There is also an ependymal 
subtype marked by FOXJ] expression as well as 
well several CFAP genes (for example, CKAP45 
and CFAP54). Neural stem cells in ventral re- 
gions also express SOX2 and NES, but only the 
late RG cells express PAX6. We also categorized 
them into different subtypes based on the 
similar genes we used for dorsal neural stem 
cells including HMGA2, HOPX and CRYAB 
as well as ventral-specific signatures such as 
NKX2-1, NKX6-2 and OLIGI. Cajal Retzius cell 
lineages were extracted based on their evident 
RELN expression. Anterior enriched Cajal Retzius 
cells showed higher expression of ETV1", whereas 
the posterior-enriched ones had high TP73 ex- 
pression (54). We also found two putative 
RSPO3* IPC subtypes enriched in the poste- 
rior regions and they are marked by NEUROGI 
and NHLH]I expression, respectively. Excita- 
tory neuron lineages included IPCs marked by 
EOMES expression and postmitotic neurons 
expressing NEUROD2. The IPCs consisted of 
three subtypes: VI" subtype transcriptomical- 
ly more similar to radial glial cells, NEUROGI” 
subtype dominating the cycling IPCs and also 
having noncycling cells, and NHLHI" post- 
mitotic subtype from NEUROGI* subtype. 
Subtypes of excitatory neurons were broadly 
categorized into deep and upper layer neurons 
based on their expression of SOX5 and CUX2, 
respectively. Deep layer neurons included two 
nascent subtypes (PALMD* and [D2"), a cor- 
ticothalamic subtype (SY76*), two intratelen- 
cephalic subtypes (OPRKI'/SULFT’, OPRK1"/ 
NR4A2"), and a L6B subtype (VR4A2"/GRID2") 
(30). Upper layer neurons contained two nas- 
cent subtypes (PALMD* and ADRA2A"*) and 
one intratelencephalic type (ACTN2") (30). We 
identified also a putative deep layer subtype 
(SOX5*/KCNVT') which is transiently present 
at E62-64 and one excitatory neuron subtype 
enriched at posterior cingulate cortex (TSHZ2"/ 
NR4A3°). Interneurons were classified based on 
DLX1, DLX2, GADI, and GAD2 expression. 
IPCs in the interneuron lineages were catego- 
rized based on their ASCLI expression and 
their coclustering on the UMAP connecting 
with postmitotic interneurons. Putative iden- 
tities of the postmitotic interneurons were 
classified by the expression of markers highly 
correlated with their developmental origins 
(MGE: LHX6; C/LGE: NR2F2, SP8, MEIS2) 
and transcriptomic integration with existing 
datasets with established identities including an 
independent developing macaque interneuron 
scRNA-segq data (55) and an adult macaque 
snRNA-seq data (30). The LHX6" interneurons 
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consisted of three major branches: CRABPT’ inter- 
neurons recently reported to be enriched in pri- 
mates (55, 56), LHX8" branch enriched at E37-43 
likely contributing to cholinergic neurons (57), 
and LHX6*/CRABPI interneurons dominat- 
ing the MGE-derived interneurons and giving 
rise to majority of the SST and PVALB inter- 
neurons. Within the LHX6"/CRABPI group, we 
spotted three potential sub-branches: SS7"/ 
NPY’ branch giving rise to long projecting in- 
hibitory neurons (30, 32), GUCY1A2*/RELN*/ 
DCN" branch producing major PVALB and 
SST interneurons; CCK* branch generating 
LAMP5 LHXé6 interneurons that mapped to 
mouse hippocampus Ivy cells and were re- 
cently reported to show abundance enrichment 
in the primate neocortex (56, 58). The CGE- and 
LGE-derived interneurons encompassed two 
major branches marked by NR2F2/SP8 and 
MEIS2/SP8 expression, respectively. Within 
the MEIS2/SP8 lineage, cells were largely par- 
cellated based on the expression of PAX6 
(olfactory bulb neurons) and FOXP1/FOXP2 
(striatum spiny projection neurons) (55). The 
NR2F2/SP8 lineage consists of cell subtypes 
giving rise to different adult interneuron sub- 
classes (30): LAMPS interneurons becoming 
LAMP5 RELN subclass, KIT” interneurons be- 
coming ADARB2 KCNGI subclass; VIP” inter- 
neurons becoming VIP subclass. The glia 
cells were classified into three major groups 
based on expression of OLJG2 (oligodendrocyte 
lineage-related cells), AQP4 (astrocyte lineage- 
related cells), and EGFR (glia precursor cells). 
The oligodendrocyte lineage-related cells were 
further divided by the expression of PDGFRA 
(oligodendrocyte precursor cells, OPCs), PDGFRA 
and MKI67 (oligodendrocyte precursor cells 
in proliferation stage, OPC PDGFRA MKI67), 
PCDH1I5 (late oligodendrocyte precursor cells), 
and MBP (oligodendrocytes). Notice the oligo- 
dendrocytes are less represented at the ana- 
lyzed ages. The astrocyte lineage-related cells 
were divided into three subtypes by the ex- 
pression of GEAP, EGFR, and MFGES8, respec- 
tively. Regarding glia precursor cells, we used 
combinational expression of genes to sort 
them into astrocyte intermediate precursor 
cells (alPCs, EGFR'/AQP4*/IGFBP2"), oligo- 
dendrocyte intermediate precursor cells (oIPCs, 
EGFR*/PDGFRA*/DLLI'), glia intermediate 
precursor cells (gIPCs, EGFR*/AQP4 /PDGFRA ), 
glia intermediate precursor cells in prolifera- 
tion stage (EGFR*/MKI67'). The rest of the 
immune cells and vascular-related cells were 
categorized using the following strategies. All 
immune cells were identified as PTPRC’, with 
microglia subtype further identified as CIQC* 
and T cells identified as CD69* (30). Two red 
blood lineage cell subtypes were classified 
based on the expression of HBAI and their 
unique expression of HBEI and SNCA, respec- 
tively. Vascular cells were characterized by their 
FNI expression and specific expression of other 
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subtype markers: endothelial cells (CLDN5'), 
pericytes (GRMS8"), smooth muscle cells (ACTA2") 
and vascular leptomeningeal cell (CEMIP’) (30). 
We also spotted two putative mesenchymal cell 
subtypes: one expressing LUM, consistent with 
the recently reported human mesenchymal 
cells in the human early developing telencepha- 
lon (59); the other is marked by FOXD3 and 
PLPI expression, likely representing neural crest 
cells (60). 


Identification of cell subtype markers and genes 
with region- and area-divergent expression 


To identify genes differentially expressed be- 
tween cell subtypes or brain regions and areas, 
Wilcoxon Rank Sum test was used, and a mini- 
mum expression ratio of 0.1 and a Bonferroni- 
adjusted p value threshold of 0.01 were adopted. 
In certain analyses as detailed in the following 
sections, we incorporated additional require- 
ments to filter the differentially expressed genes. 
For example, we inspected expression ratio fold 
changes, with a pseudo value of 0.01 adding to 
the gene expression ratios in the examined cell 
subtype (numerator) and background cells 
(denominator). Other criteria such as log fold 
changes of average expression and background 
expression ratios were also taken into consid- 
eration for gene filtering. 


Inference of regulatory networks in 
telencephalic organizer cells and NSCs from 
different regions 


The inference of transcription factor regulatory 
networks was based on the SCENIC workflow 
(61), by integrating motif enrichment in the 
promoter regions and gene-gene co-expression. 
The putative promoter region (upstream 2000 
bases and downstream 500 bases of the trans- 
cription start site) of each gene, as well as the 
motifs of human transcription factors, were 
prepared (62). R package PWMEnrich as used 
to perform the motif enrichment analysis, with 
p value threshold set at 0.05 and raw score 
threshold set to 2.5 to only retain transcription 
factor genes with significant motif enrichment 
at the promoter of the given gene. This anal- 
ysis generated raw regulons, which refers to a 
module of genes including a transcription fac- 
tor gene and a list of putative targets. Then, 
within each relevant cell subtype, we calcu- 
lated the top 200 subtype markers ranked 
by expression ratio fold changes using the 
FindMarkers function in Seurat (49) and 
intersected the markers with the raw regu- 
lons to filter transcription factors and targets. 
In this analysis, for simplicity, we merged the 
markers of the two anteroventral NKX2-]I* 
subtypes (AV NKX2-1 LMOI and AV NK2-1 
NKX6-2) and the markers of the two GE NKX2-I* 
subtypes (GE RG NKX2-] DLKI and GE RG 
NKX2-1 OLIGI), respectively. To identify co- 
expressed transcription factor and targets, 
we correlated their expression across pseu- 
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dobulk samples. Specifically, we ordered the 
cells along the UMAP-1 axis calculated via 
the Seurat RunUMAP function with a setting 
of dimension equal to one, parcellated cells 
into 30 equal-width bins along the axis and 
removed bins with less than 15 cells. Average 
expression was calculated in each bin, followed 
by assessing the Pearson correlation of the ex- 
pression of transcription factors and the pre- 
dicted targets. To further evaluate the robustness 
of the correlation, we also permutated the gene 
expression in each subtype, maintaining the 
average expression levels and variance of each 
gene but disrupting the gene-gene correlations. 
The permutation was repeated for 1000 times 
and the p value for the correlation of a given 
gene pair is defined as (n + 1) / (1000 + 1), where 
n represents the number of permuted correla- 
tion coefficients exceeding the actual correlation 
coefficient. Only transcription factor-target pairs 
with significant correlations (p value < 0.05 and 
coefficient > 0.3) were retained. The regulons 
in each cell subtype were then merged and 
visualized in a network with arrows indicating 
regulatory directions from transcription factor 
to targets and nodes colored by the cell sub- 
types showing the significant regulation. In 
addition, we curated several signaling pathways 
relevant to organizer functions and overlaid the 
information on the network (Fig. 2D). To com- 
plement the analysis illustrating how signaling 
components are used across telencephalic do- 
mains, we performed the following two analyses: 
(i) measured the enrichment of signaling path- 
way genes in each organizer subtype by calcu- 
lating the odds ratios of the overlapping between 
the signaling pathway genes and organizer sub- 
type markers; (ii) intersected the signaling path- 
way genes with organizer subtype markers. 


Transcriptomic comparison between mouse and 
macaque organizer domain subtypes 


To assess the transcriptomic similarity between 
macaque and mouse telencephalic organizer 
domains (25), we derived the subtype markers 
in each dataset using the FindMarkers func- 
tion in Seurat and extracted the shared sub- 
type markers between the two species. These 
included many key genes labeling homologous 
cell subtypes. Average expression of the shared 
subtype markers were calculated across sub- 
types followed by Pearson correlation co- 
efficient measurement for each pair of subtypes 
between the two datasets. To avoid noise from 
background transcriptomic similarities, any 
correlation coefficients below the 80% quan- 
tile of all values were removed. The filtered 
subtype similarity was visualized in a Sankey 
plot, which illustrates subtype matching be- 
tween the two species. Alternatively, we cal- 
culated the enrichment of mouse subtype 
markers in this dataset through the AUCell 
algorithm (67). Because this method uses rank- 
based expression values to assess expression 
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enrichment, it is robust to potential quality 
differences between the mouse and macaque 
datasets. For a given set of mouse subtype 
markers, we averaged its enrichment scores 
in each macaque subtype and visualized the re- 
sults in a heat map, which recapitulated the 
subtype similarity patterns shown by the above 
correlation-based analysis. In order to find 
species-specific expression patterns in homol- 
ogous subtypes and avoid potential batch ef- 
fects, we set a high threshold for the differential 
expression test. To identify macaque-enriched 
genes, we first extracted the top 100 markers 
for each macaque subtype and removed the 
genes that were also markers of the homolo- 
gous mouse subtypes. Next, a minimum expres- 
sion ratio of 0.2 was required in macaque 
subtypes whereas a maximum expression ratio 
of 0.05 was set in the mouse homologous 
subtypes. For each macaque subtype, the top 
10 genes ranked by their expression fold changes 
between the given subtype and the background 
macaque cells were selected and visualized. 
The same approach was used to find mouse- 
enriched genes in homologous subtypes. Notice 
that the expression of three antihem canonical 
genes, TGFA, Neuregulin 1 (VRGI) and Neu- 
regulin 3 (VRG3) (53), was not detected in the 
macaque putative antihem (PC SFRP2). Inter- 
estingly, we found these three genes be specific 
to mouse antihem subtypes (not shown). How- 
ever, the expression of these genes was detected 
in other cell subtypes, ruling out genome anno- 
tation issue. In commensurate, realignment of 
the data to an independent genome annotation 
from ENSEMBL showed clear expression of 
SFRP2 in macaque PC SFRP2, but not for 
these three genes. Another possibility is that 
the sequencing depth for this subtype was too 
low to capture the signals. However, PC SFRP2 
has an average of 2825 UMIs which is com- 
parable to other subtypes expressing the three 
genes, suggesting that the sequencing depth 
might not be the issue. Thus, the undetectable 
level of these three genes in the monkey might 
be attributed to species differences in the timing 
of expression or in the developmental states of 
the antihem cells (27). 


Ligand-receptor mediated cell-cell 
communication between patterning centers and 
early neural stem cells 


We applied two complementary expression- 
based approaches, CellChat and CellphoneDB 
(63, 64), to infer putative cell-cell communica- 
tions between organizer domain cell subtypes 
and early neural stem cells. In these analyses, 
we only included subtypes potentially secret- 
ing patterning ligands (RPC, anteroventral do- 
main subtypes, and cortical hem) and neural 
stem cell subtypes responding to these mole- 
cules (that is, NESC and early vRG from ante- 
rior, posterior regions and ganglionic eminence). 
Each subtype was down-sampled to have 
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equal number of cells (1000 cells), otherwise 
subtype size differences could affect marker 
detection in CellChat and permutation in 
CellphoneDB. In both analyses, we set a mini- 
mum expression ratio of 0.05 and p value 
threshold at 0.05. For the resulted ligand- 
receptor interactions, we only considered the 
directions with ligands expressed in organizer 
domain subtypes and receptors in regional 
neural stem cells. Because the output results 
from the two analyses are largely shared, we 
mainly used the CellChat-based results and 
incorporated additional interactions reported 
only by CellphoneDB-based analysis. To get a 
broad view of the interaction patterns between 
ligand-receptor pairs, we performed t-SNE 
analysis using the interaction matrix, with rows 
as ligand-receptor pair names and columns as 
cell subtype pairs. In addition, we clustered the 
ligand-receptor pairs based on their orches- 
trated cell-cell interaction patterns using robust 
sparse K-Means clustering algorithm. The re- 
sulted 10 clusters are well separated on the 
t-SNE layout and further confirmed the dis- 
tinct cell-cell interaction patterns mediated by 
ligand receptor pairs. 


Lineage construction from organizer domain 
progenitors to offspring cells 


To define the lineage progression from orga- 
nizer domain progenitors to their progeny 
cells and also delineate the gene cascades along 
the lineage, RNA velocity analysis using scVelo 
package (65) was applied. In each lineage, UMAP 
layout was first obtained via the RunUMAP 
function in Seurat and the resulted Seurat 
object was converted to anndata for scVelo an- 
alysis. For simplicity and avoiding cell cycling 
genes driving the gene cascades, cycling cells 
were not included in the analysis. After data 
filtering, normalization, identification of highly 
variable genes and computing moments for 
velocity estimation, dynamic model was ap- 
plied to compute the RNA velocity vectors 
and pseudotime. The top 300 genes showing 
transcriptional variation along the lineages 
were visualized on heat maps. 


Transcriptomic comparisons with published data 
for neural stem cells and neurons 


We applied the following two methods to eval- 
uate cell subtype matching between datasets 
for neural stem cells and neurons. In the first 
method, cross-dataset cell subtype similarity 
was measured by Pearson correlation coef- 
ficients. Specifically, the intersection of the 
highly variable genes between this study and 
a given published dataset were selected, and 
the average expression of these genes across 
each cell subtypes were computed via the 
AverageExpression function in Seurat (49) 
followed by log transformation. Pearson cor- 
relation coefficients were calculated on the 
log-transformed average expression for each 
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pair of subtypes between this study and the 
given published dataset. In the other approach, 
cells from this study and a given published 
dataset were integrated and visualized on 
UMAP to evaluate the subtype alignment. 
Because there were prominent batch effects 
between this study and publish datasets, largely 
attributed to differences of species, develop- 
mental stages and technical approaches, we 
applied Seurat integration algorithm (49) as a 
stringent method to remove batch effects. Here, 
the intersection of the top 2000 highly variable 
genes from each study were used for canonical 
correlation analysis, followed by anchor finding 
and hierarchical integration of normalized data 
using the JntegrateData function. The inte- 
grated data were then scaled, used for princi- 
pal components analysis and UMAP visualization. 


Region-specific gene expression cascades 


We used the following approaches to con- 
struct the region-specific expression cascades. 
For each subtype in a given region, we calcu- 
lated the genes showing expression enrich- 
ment in this region compared to all other 
regions using Wilcoxon Rank Sum test. To 
avoid the influence of cell number differences, 
for each subtype we downsampled each region 
to have the same number of cells. The dif- 
ferential expression analysis results were fur- 
ther filtered based on expression ratios, fold 
changes of average expression, fold changes 
of expression ratio and Bonferroni-adjusted P 
values, to get genes with most salient regional 
enrichment. By leveraging the defined pseudo- 
time that organize cells from different regions 
on the same scale, we parcellated cells from 
the four regions into bins with equal pseudo- 
time width. We then calculated the average 
gene expression along the pseudotime bins for 
each region and fitted the expression into im- 
pulse models (linear, single sigmoid or double 
sigmoid) implemented in the URD package 
(66), which returned the pseudotime points 
representing where gene expression arises 
and diminishes. We thus ordered regionally 
enriched genes based on the earliest subtypes 
that they displayed regional enrichment, fol- 
lowed by the predicted pseudotime points 
denoting gene expression on and off. To gain 
a better visualization of the expression patterns, 
average expression along the pseudotime bins 
were smoothed using loess function in R. 


Gene Ontology enrichment analysis 


Gene ontology (GO) enrichment analysis was 
performed by the Bioconductor package 
“topGO” (http://bioconductor.org/packages/ 
release/bioc/html/topGO.html) using the Fisher’s 
exact test followed by FDR adjusting P values. 
Only GO terms under biological processes 
were included in the analyses and a threshold 
of FDR < 0.1 was selected to pick the sig- 
nificant terms. 
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SPECIAL SECTION 


Augur and differential gene expression analysis 
assessing transcriptomic divergence between 
the shared subtypes across different regions 

In order to evaluate the magnitude of regional 
differences between different cell types, which 
might come from the same lineage (for exam- 
ple, IPCs, nascent and mature excitatory neu- 
rons) or belong to the same cell class (for 
example, LHX6" versus NR2F2*/SP8" inter- 
neurons), we applied the Augur algorithms 
to measure the transcriptomic separability of 
regions for each relevant subtype (67). To 
recapitulate regional variation as much as 
possible, for each batch we extracted the 
highly variable genes across the cells from 
all the analyzed brain regions and used the 
SelectIntegrationFeatures in Seurat to identify 
the top regionally variable features. In running 
Augur algorithms, these highly variable genes 
were directly used, with the mode set to 
“velocity” to avoid additional detection of 
highly variable genes. The Augur analysis 
was performed for each pair of regions, which 
provided a detailed view of transcriptomic 
divergence of cell types across brain regions. 
In addition, we used the number of differen- 
tially expressed genes to evaluate regional 
difference changes along RG progression and 
excitatory neuron differentiation and matura- 
tion (Figs. 4C and 5C). We parcellated cells 
into different equal-width bins along the pseu- 
dotime and downsampled the cells from each 
region to have a balanced number of cells 
across bins and regions. Then we applied 
Wilcoxon Rank Sum test and calculated the 
number of differentially expressed genes be- 
tween regions along the pseudotime bins fol- 
lowed by visualizing in log scale and smoothed 
via loess function. 


Hierarchical clustering of excitatory neuron 
lineage subtypes across cortical regions 
and areas 


To check whether the regional differences of 
the excitatory neurons are correlated with the 
anatomical proximity of the brain regions they 
populate, we leveraged the refined regions/ 
areas sampled at E93 and E110. We first cal- 
culated the highly variables genes at each 
batch (here is individual) and used the Seurat 
SelectIntegrationFeatures function to capture 
the top 2000 genes with highest expression 
variability across analyzed brain regions. For 
a given excitatory neuron subtype, average 
expression was calculated in each region and 
cells from different regions were down-sampled 
to have the consistent number of cells (100 cells) 
prior to average expression calculation. The 
resulted average expression of the 2000 highly 
variable genes from the shared subtype in 
different regions were used for hierarchical 
clustering. Here, the distance matrices were 
defined as Pearson correlation coefficients sub- 
tracted from 1 and were subsequently used by 
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the Aclust function with “ward.D2” algorithm 
for clustering. Dendrogram visualization was 
achieved via the circlize R package. To obtain 
robust estimate of region co-clustering, we 
generate 1000 bootstrap replicates from the 
above 2000 highly variable genes, randomly 
extracting 80% of genes for each replicate. 
For each replicate, the same subtype across 
regions were clustered using the above strate- 
gy followed by cluster separation via splitting 
the hierarchical tree to k=3 clusters. The fre- 
quencies of region co-clustering were measured 
and visualized by heat maps. In addition, we 
permutated the gene expression for 1000 rep- 
licates, keeping the gene-wise characteristics 
(for example, mean expression, variance) but 
destroying the gene-gene relationship. The 
average expression from the 1000 replicates 
of permuted data were used following above 
bootstrap clustering strategy to see how often 
subtypes from different regions cluster together. 
By comparing the co-clustering frequency in 
the actual data versus the permuted data, we 
confirmed the region co-clustering in excita- 
tory neuron subtypes reflect true regional 
differences. 


Correlation of regional gene expression 
specificity between cell types 


To assess the effect of region-specific environ- 
ment on cell type identities, we correlated the 
region specificity of gene expression across 
multiple cell types. Specifically, we identified 
all the genes divergently expressed across brain 
regions and assessed their expression enrich- 
ment in each region. The gene expression fold 
changes were calculated by dividing the aver- 
age expression in the given region by the aver- 
age expression in other background regions, 
with a pseudo-value of 0.1 added to both the 
numerators and denominators. The resulted 
fold changes were log2 transformed, termed 
as the regional enrichment scores here. Within 
each region, the pairwise comparisons of such 
enrichment scores were visualized in dot plots 
and Pearson correlation coefficients were also 
calculated. 


Analysis of bulk-tissue RNA-seq data 


We applied STAR (68) to align the raw reads to 
the same genome we used for the scRNA-seq 
data, followed by FeatureCounts (69) to com- 
pute the read counts. DESeq2 (70) was used to 
perform differential expression tests between 
conditions, selecting only genes with false dis- 
covery rate smaller than 0.01 and log2 fold 
changes bigger than 1. In the differential ex- 
pression analysis, we parcellated the in vitro 
time points to three stages (early: days in vitro 
[DIV] 1-5; middle: DIV 6-11; late: DIV 14-20). 
We have conducted two types of analyses to 
correlate the in vitro results with the in vivo 
studies. First, we identified region-specific genes 
in the in vitro NSCs (DIV 1-5) and measured 
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their regional enrichment in the age-matched 
NSCs in vivo via Wilcoxon Rank Sum test. We 
only considered genes to be significant if they 
have Bonferroni-correct P values smaller than 
0.05, expression ratios larger than 0.05 and log 
fold changes of expression larger than 0. The 
results were visualized in volcano plots and the 
genes displaying consistent regional enrichment 
between in vitro and in vivo were highlighted 
and labeled. This analysis showed many in vivo 
NSC regional identities were maintained in vitro 
in NSC stage. In the second analysis, we extracted 
genes showing enrichment in the late stage 
(DIV 14-20) but not in the early NSC stage 
(DIV 1-5). Similar regional enrichment tests 
were performed for these genes, with the ex- 
ception that the tests were performed on age- 
matched excitatory neuron subtypes, that is 
E54-64 L6 CT (EXN SOX5 SYT6) and L6B (ExN 
SOX5 NR4A2 GRID2) for the region-specific 
genes obtained in the differentiated neurons 
from E42 NSCs in vitro, and E93 upper layer 
excitatory neuron subtypes (ExN CUX2 ADRA2A 
and ExN CUX2 ACTN2) for those obtained 
in the differentiated neurons from E77 NSCs 
in vitro. 


Evolutionary comparisons between human and 
macaque midfetal arealization signatures 


We leveraged the transcriptomic-based age 
matching between human and macaque (13) 
to select the age-matched human and macaque 
scRNA-seq neocortical data for the evolution- 
ary comparisons. Here, we used macaque E77- 
78 data in this study and the gestation week 
(GW) 18-19 data from a developing human 
scRNA-seq dataset (J6), with only prefrontal 
and occipital regions included as there are more 
cells in these two regions for all the major 
excitatory neuron types. The human data were 
reprocessed and annotated in the same manner 
as we did for this study. To avoid potential se- 
quencing depth bias, we subset the human and 
macaque data using ortholog genes and then 
downsampled each homologous cell subtype 
to have the same number of UMIs. Data nor- 
malization were performed again using only 
the the ortholog genes. Region specific genes 
were identified in each species using the 
FindMarkers function in Seurat. To identify 
the conserved region-specific signatures, we 
intersected the region-specific genes from 
the two species and visualized the top 20 genes 
ranked by their average expression ratio fold 
changes. We then used the same FindMarkers 
function to calculate the genes enriched in 
human or macaque homologous cell types, 
but with more stringent thresholds to avoid 
bias: minimum expression ratio of 0.1 in the 
enriched species, maximum expression ratio 
of 0.1 in the depleted species, expression ratio 
fold changes bigger than 1.5, and Bonferroni- 
adjusted p values smaller than 0.01. These genes 
were further intersected with region-specific 
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genes in each species to obtain species- and 
region-specific signatures. 


Hierarchical clustering of regional genes 
expression patterns for inhibitory neurons 


Although Augur algorithm detected lower re- 
gional transcriptomic differences in inhibitory 
neurons, we could still find certain genes 
differentially expressed across cortical regions 
in the shared inhibitory neuron types. In order 
to obtain an overview of their global expres- 
sion patterns across regions, we applied a hie- 
rarchical strategy clustering genes based on 
their expression similarity. Here, we only con- 
sidered two major cortical inhibitory neuron 
groups: LHX6" (excluding the CRABPI" popu- 
lation not detected in neocortex) cells and 
NR2F2"/SP8" cells. Regionally divergent genes 
were obtained through Wilcoxon Rank Sum 
test with the Bonferroni-corrected P value 
threshold set at 0.01. In case cell numbers affect 
the number of differentially expressed genes, 
we downsampled the cells in each region to the 
same level. For each region, we generated 20 
pseudobulk samples each containing 200 ran- 
dom cells and calculated the average expression 
of the regionally-divergent genes across all 
pseudobulk replicates. The resulted expression 
matrix was used for hierarchical clustering, 
with the distance matrix defined as one minus 
Pearson correlation coefficients between 
gene pairs and clustering algorithm set as 
“ward.D2”. 


Lineage inference of the switch between 
neurogenesis and gliogenesis 


To explore the developmental transition from 
neurogenesis to gliogenesis, we employed two 
different approaches: unsupervised transcrip- 
tomic clustering by Seurat (49) analysis pipeline 
and cell lineage tracing by Monocle (version 2) 
analysis pipeline (77), to analyze the transcrip- 
tional association inbetween radial glial cells, 
excitatory neurons and glia cells and define their 
lineage relationship. In the Monocle analysis 
pipeline, we firstly recruited Seurat FindMarkers 
function to perform differential expression 
analysis for any pair of cell subtypes in each 
developmental stage. Subsequently, the differ- 
ential expressed genes were accumulated and 
used to infer cell trajectory. Following up the 
recommended workflow, the suggested default 
parameters in Monocle were preferred, except 
that the “DDRTree” method was use as reduc- 
tion model and the batch correction was intro- 
duced. To compute the pseudotime for each 
cell along the cell trajectory tree, the tree root 
was manually selected through analyzing the 
tree structure and the distribution of cellular 
nature ages. Lastly, we used Seurat FindMarkers 
function to perform differential expression analy- 
sis for any pair of branches to identify genes 
specific to each tree branch, which led to a 
proxy for detecting the transcriptional program 
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that dominates the segmentation and emer- 
gency of each tree branch. 


Global cross-dataset comparison for glial cells 


The glial cell types in the current study were 
compared to the external datasets, including 
datasets from prenatal humans, adult monkeys, 
and lifespan mice, to confirm the quality of cells 
and the precision of cell type annotation. For 
astrocytes, we defined three subtypes, which 
together with astrocyte precursor and glia 
precursor were compared among multiple brain 
regions to reveal their regional distribution. One 
external astrocyte dataset collected from differ- 
ent cortical layers in P14 mice (72) was compared 
with our astrocytes to show their laminar dis- 
tribution, and another external astrocyte dataset 
collected from dorsolateral prefrontal cortex in 
adult macaque (30) were compared with our 
astrocytes to investigate the correspondence 
between developmental and adult stages. We 
used specScore script, as previously detailed in 
(18), to compute specificity score for each gene 
in each subtype and correlated subtypes across 
datasets using the specificity scores, with the 
pairwise subtype similarity visualized by allu- 
vial plots. On the other hand, the cross-dataset 
comparison was conducted by using above- 
mentioned UMAP pipeline. Firstly, the differ- 
ent datasets were separately wrapped according 
to Seurat analysis workflow, which includes 
abovementioned a range of processes. Sec- 
ondly, fastMNN (5/7) was used to correct 
unwanted batch variation by choosing the 
different datasets as the major source of sys- 
tematic variation. Alternatively, other integra- 
tion methods including Harmony (52) were 
considered to verify the results reported by 
fastMNN, whose analyses were not shown 
when an negligible difference of cell type cor- 
respondence was observed. 


Predication of ligand-receptor mediated cell-cell 
communication across refined cortical areas 
in E93-110 


We used the E93 and E110 macaque data for 
this analysis as more refined brain areas were 
sampled at this stage and most of neuron and 
glial cell types have emerged, in particular for 
astrocytes that gradually increase and inter- 
play with other cells. The communication in- 
between cells can be inferred by correlating 
the expression of ligand and the corresponding 
receptor genes. We used CellChat (64) to com- 
pute the averaged communication in-between 
major cell types, including excitatory neurons, 
interneurons, astrocyte, enIPCs, OPC/oligoden- 
drocyte, microglia and endothelial cells. The 
aggregation of cell subtypes into major cell 
types allow us to achieve high accuracy and 
reduce single cell noise. Cell-cell communi- 
cation were computed in each brain region 
separately, and subsequently the interac- 
tion strength between any pair of cell types 
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were averaged among brain regions to pro- 
vide a proxy of the general tendency of cell- 
cell communication. 


Compilation of brain disease risk gene lists 


We compiled disease-risk genes from DISGENET 
(https://www.disgenet.org/) (73) and filtered-out 
those diseases having fewer than 30 risk genes 
associated. Genes from “Mixed oligoastrocytoma” 
and “oligodendroglioma” were combined in the 
“Mixed Oligoastrocytoma+Oligodendrogliomas” 
list (abbreviated as M. Oligoastr.+ Oligo.). Also, 
genes associated with any medulloblastoma 
were combined in the “Medulloblastomas” 
list. Multiple genome wide associated studies 
(GWAS) were used to collect gene lists for the 
study: Alzheimer’s disease (AD) (74), anorexia 
nervosa (AN) (75), autism spectrum disorder 
(ASD) (76), bipolar disorder (BD) (77), intelli- 
gence quotient (IQ) (78), major depressive dis- 
order (MDD) (79), neuroticism (NEUROT) (80), 
Parkinson’s disease (PD) (81) and _ schizo- 
phrenia (SCZ) (82). Genes identified by running 
Multi-marker Analysis of GenoMic Annotation 
(MAGMA) (83) were also included for the fol- 
lowing conditions: attention-deficit/hyperactivity 
disorder (ADHD), AD, AN, ASD, BD, IQ, MDD, 
NEUROT, obsessive-compulsive disorder (OCD) 
(84), PD, SCZ and Tourette syndrome (TS) (835). 
Only genes with a nominal p-value of less than 
0.05 were used and we selected the top 200 genes 
according to their p values. Genes implicated 
in ASD susceptibility were also obtained from 
the SFARI database (https://gene.sfari.org). 
Only genes of SFARI categories S (Syndromic), 
1 chigh confidence), 2 (strong candidate) and 
3 (suggestive evidence) were employed. Ad- 
ditionally, high-confidence ASD genes from 
(86), and a set of genes involved in develop- 
mental delay (DD) from the Deciphering De- 
velopmental Disorders Study consortium 
(87). Here, we included all the regions for each 
cell subtype in majority of the disease risk 
gene analysis, except when intersecting dis- 
ease risk genes with region-specific subtype 
markers. 


Cell type enrichment of disease gene expression 


scCRNA-Seq data were preprocessed following 
the guidelines for EWCE analysis (88). In brief, 
normalization and variance stabilization of 
scRNA-seq data using regularized negative 
binomial regression was performed with the 
sctransform R library (89). SCT-corrected counts 
were computed for genes being marker genes 
of any subtype with an adjusted p-value < 0.05 
and having a 1-to-1 matching human ortholog. 
Macaque-to-human orthologs were retrieved 
using the ortogene R library (https://github.com/ 
neurogenomics/orthogene). Then, we performed 
EWCE's bootstrap enrichment test of the dis- 
ease-associated gene lists previously defined 
(20000 repetitions, geneSizeControl=FALSE, 
controlledCT=NULL, and mtc_method=“BH”). 
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Only the risk genes present in the dataset after 
SCT normalization were tested. 


Identification of disease risk genes showing as 
organizer domain subtype markers 


Marker genes were computed using the 
Wilcoxon Rank Sum test on log2-normalized 
data by FindAllMarkers from the Seurat pack- 
age (49). We computed marker genes of all the 
cell subtypes and the organizer domain sub- 
types independently. In those subsets, expres- 
sion data were scaled. Averages of gene scaled 
expression were obtained for every gene in the 
disease-associated gene lists and we computed 
the median of averages per list. We also re- 
tained the number of genes from each list that 
were expressed in at least 10% of cells for each 
cell subtype. Regarding patterning centers’ 
marker genes, we were interested in those 
that were expressed distinctively in one of the 
PC subtypes. For that, we selected the marker 
genes requiring them to be expressed in at 
least 10% of cells in the respective PC subtype 
and in less than 10% of cells of other orga- 
nizer domain subtype cells. With those markers, 
we created 2x2 contingency tables for each 
disease-risk gene list and subtype with counts 
of genes being or not subtype markers, and 
genes being or not disease-associated. Using 
those tables, we estimated the log2 odd ratio 
of being disease-associated and subtype marker 
using the R function fisher.test. Among those 
disease-risk genes showing as markers for a 
organizer domain subtype (132 genes), we also 
investigated whether their expression is re- 
stricted to the given organizer subtype versus 
other non-organizer cell subtypes. For this, we 
only considered the neural lineage cell sub- 
types (that is, all the subtypes excluding im- 
mune, red blood, vascular and mesenchymal 
cells). We required the genes to show as the 
marker of a given organizer subtype and also 
expressed in no more than three other neural 
lineage cell subtypes (genes expressing in more 
than 10% cells of a given subtype were con- 
sidered to be expressed). This resulted 26 genes 
as shown in Fig. 7B. 


Expression enrichment of disease risk genes 
across radial glial cells 


To gain a detailed view of the expression en- 
richment patterns of disease risk genes, we 
applied AUCell (67) enrichment analysis on 
the disease gene list across neural stem cell 
subtypes. To have robust estimation of each 
subtype and further avoid the influence of 
certain low-quality cells, we generated 100 
pseudobulk samples in each neural stem cells 
by randomly pooling 200 cells, followed by 
calculating average gene expression in each 
of the pseudobulk samples. AUCell analysis 
was directly performed on these pseudobulk 
samples together with the disease gene lists, 
with the results visualized on a heat map. 
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Diseases were ordered according to their 
peak enrichment along the x axis (the sub- 
types). Because we saw that many disease risk 
genes showed high enrichment in outer radial 
glial cells, we wondered if it was caused by 
quality bias. Although AUCell package is ro- 
bust to data quality differences, as it uses rank- 
based method to assess the separability of 
query genes versus background gene, we fur- 
ther subset the data to have 2000 UMIs per 
cell (cells with less than 2000 UMIs were not 
included) and generated pseudobulks and per- 
formed enrichment following the same strategy 
as described above. This overall lead to a similar 
enrichment pattern. We next decided to identify 
the genes underlying the dynamics disease 
enrichment patterns. For each disease, we ex- 
tracted its risk genes and correlated (Pearson 
correlation) their expression patterns with the 
given enrichment pattern of the disease. We 
set a threshold of 0.7 to capture the strongest 
signals and resulted genes were visualized in a 
heat map. 


Intersection of disease risk genes and 
region-specific signatures 


We set relatively high thresholds to identify 
disease genes showing prominent regional 
and cell-type enrichment. For each region, 
we calculated cell type markers and regionally- 
enriched genes using Wilcoxon Rank Sum test 
followed by setting the same thresholds for 
both analyses: Bonferrroni corrected p value 
< 0.01, log fold changes of average expression 
> 0.25, expression ratio fold changes > 1.25, 
and expression ratio > 0.2. The intersection 
of the cell subtype markers and regionally- 
enriched genes were further overlapped with 
disease risk genes and the results were vi- 
sualized in fig. S19. The intersection of the top 
25 cell subtype markers and top 25 regionally- 
enriched genes ranked by fold changes of ex- 
pression ratios were overlapped with disease 
risk genes and visualized in Fig. 7D. 


Monkey fetal cortical NSC culture and 
in vitro differentiation 


Fetal macaque brains were isolated from E42 
and E77 fetuses. Telencephalic regions were iden- 
tified and dissected and single cells isolated in 
HBSS, as described above. After centrifugation, 
HBSS was removed and cells were switched 
to DMEM /F12 medium with N2 supplement 
(described below). Monkey region-specific 
telencephalic cells were cultured using an 
adapted version of a protocol previously re- 
ported for mice (29). 3-5 x 10° cells were plated 
over 10 cm culture plates (Falcon, 35-3003), 
previously coated with poly-L-ornithine (PLO) 
(Sigma, P3655) and fibronectin (FN) (R&D 
Systems, 1030FN), and expanded at 37°C, 5% 
O, and 5% CO, for 5-6 days in DMEM/F12 
medium (Mediatech 16-405-CV) plus N2 sup- 
plement, containing 25 ug/ml bovine insulin 
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(Sigma, 16634), 100 ug/ml apotransferrin (Sigma, 
T2036), 20 nM progesterone (Sigma, P8783), 
100 uM putrescine (Sigma, P5780), 30 nM 
sodium selenite (Sigma, $5261), penicillin/ 
streptomycin (Life Technology, 15140-122). 
We refer to DMEM/F12 medium with N2 sup- 
plement as N2 medium, hereafter. 20 ng/ml 
bFGF (R&D Systems, 4114-TC) was added daily 
and the medium was changed every other day. 
NSCs were lifted with Accutase (STEMCELL 
Technologies, 07920) and aliquots of 3-5 x 10° 
cells/500 ul of N2 were frozen at -80°C. The in 
vitro neurogenesis experiment shown in fig. 
S13B was performed as following: thawed 
NSCs were expanded in presence of 20 ng/ml 
FGF2 for 5 days until confluence. Then region- 
specific NSCs were dissociated with Accutase 
and passaged into PLO/fibronectin-coated 6 
well plates (Falcon, 35-3046), at a density of 
300.000/well x 2 wells in N2 medium + 1 ng/ml 
FGF2, which favors neurogenesis (29). Cells 
were expanded from from DIV 1 to 5 and FGF2 
was added daily. Differentiation of the NSCs 
was induced at DIV 5 by FGF2 withdrawal, 
keeping the cells in N2. At DIV 7, N2 was 
replaced with NeuroBasal medium (NB) (Life 
Technology, 12348-017), containing 25 ug/ml 
insulin, 30 nM sodium selenite, Glutamax (Life 
Technology, 35050061), 1x B27 (Life Technolo- 
gies, 17504-044), 10 ng/ml BDNF (R&D Sys- 
tems, 248- BD) and 10 ng/ml NT-3 (R&D 
Systems, 267-N3) until the end of experiment 
at DIV 17 or DIV 20. 


Bulk RNA-sequencing library preparation 


At the time points across the in vitro differen- 
tiation indicated in fig. S13B, total RNA was 
extracted from 2 different wells/condition of 
the monkey primary region-specific cells, using 
RNeasy Mini Kit (Qiagen, 74104), according to 
manufacturer’s protocol. RNA Seq Quality 
Control: Total RNA quality was determined 
by estimating the A260/A280 and A260/A230 
ratios by nanodrop. RNA integrity was de- 
termined by running an Agilent Bioanalyzer 
gel, which measures the ratio of the ribosomal 
peaks. Samples with RIN values of 7 or greater 
were considered for library prep. RNA Seq 
Library Prep: mRNA was purified from ap- 
proximately 200 ng of total RNA with oligodT 
beads and sheared by incubation at 94°C in 
the presence of Mg (Roche Kapa mRNA Hyper 
Prep Cat. KR1352). Following first-strand syn- 
thesis with random primers, second strand syn- 
thesis and Atailing were performed with dUTP 
for generating strand-specific sequencing libra- 
ries. Adapter ligation with 3’ dTMP overhangs 
were ligated to library insert fragments. Library 
amplification amplifies fragments carrying the 
appropriate adapter sequences at both ends. 
Strands marked with dUTP were not amplified. 
Indexed libraries were quantified by gRT-PCR 
using a commercially available kit (Roch KAPA 
Biosystems Cat. KK4854) and insert size 
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distribution determined by either the Agilent 
Bioanalyzer. Samples with a yield of = 0.5 ng/ul 
and a size distribution of 150-300bp were used 
for sequencing. Flow Cell Preparation and Se- 
quencing: Sample concentrations were normal- 
ized to 1.2 nM and loaded onto an Illumina 
NovaSeq flow cell at a concentration that yields 
25 million passing filter clusters per sample. 
Samples were sequenced using 100bp paired- 
end sequencing on an Illumina NovaSeq6000 
according to Illumina protocols. The 10bp 
unique dual index was read during additional 
sequencing reads that automatically follow 
the completion of read 1. Data generated 
during sequencing runs were simultaneous- 
ly transferred to the YCGA high-performance 
computing cluster. A positive control (prepared 
bacteriophage Phi X library) provided by 
Illumina was spiked into every lane at a con- 
centration of 0.3% to monitor sequencing 
quality in real time. Data Analysis and Storage: 
Signal intensities were converted to individual 
base calls during a run using the system's Real 
Time Analysis (RTA) software. Base calls were 
transferred from the machine's dedicated per- 
sonal computer to the Yale High Performance 
Computing cluster via a 1 Gigabit network mount 
for downstream analysis. Primary analysis— 
sample de-multiplexing and alignment to the hu- 
man genome—was performed using Illumina’s 
CASAVA 1.8.2 software suite. 


Human and macaque telencephalic 
organoid culture 


The human iPSC line Y6 was provided by Yale 
Stem Cell Center. This cell line was generated 
from neonatal skin fibroblasts using Cyto- 
Tune™-iPS Reprogramming Kit (Invitrogen, 
A13780). Chromosome analysis was performed 
on cultured cells. Of the five metaphases ex- 
amined, no structural and numerical abnorma- 
lity was noted and the karyotype was consistent 
with that of a female (46, XX) complement. The 
pluripotency of the cells was confirmed by 
teratoma assay. Macaque iPSC line was gen- 
erated by reprogramming E40 macaque lung 
fibroblasts using CytoTune-iPS 2.0 Sendai Re- 
programming kit (ThermoFisher Scientific, 
A16517) and authenticated by morphology 
and karyotyping. All human and macaque iPSC 
lines were tested negative for mycoplasma con- 
tamination, checked monthly using the MycoA- 
lert Mycoplasma Detection Kit (Lonza). For 
maintenance of pluripotency, cells were dis- 
sociated to single cells with Accutase (STEM- 
CELL Technologies, 07920) and plated at a 
density of 1 x 10° cells per cm? in Matrigel 
(BD, 354277)-coated 6-well plates (Corning, 
3516) with mTeSR1 (STEMCELL Technolo- 
gies, 85850) containing 5 uM Y27632, ROCK 
inhibitor (STEMCELL Technologies, 72302), 
as previously described (29). ROCK inhibitor 
was removed 24 hours after plating, and cells 
were cultured for another 4 days before the 
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next passage. Telencephalic organoids were 
generated by the directed differentiation pro- 
tocol as previously described (31, 90). Human 
and macaque iPSCs were dissociated into single 
cells using Accutase. Neural induction was 
directed by dual SMAD and WNT inhibition 
(91) using a neural induction medium com- 
posited with 50% (v/v) DMEM/F12 (Thermo- 
Fisher Scientific, 11330032), 50% (v/v) Neurobasal 
medium (Thermo Fisher Scientific, 21103049), 
1% (v/v) N2 (Thermo Fisher Scientific, 17502048), 
2% (v/v) B27 minus vitamin A (Thermo Fisher 
Scientific, 12587010), 1% (v/v) MEM non-essential 
amino acid (Thermo Fisher Scientific, 11140050), 
1% (v/v) GlutaMAX (Thermo Fisher Scientific, 
35050061), 1% (v/v) Penicillin/Streptomycin 
(Thermo Fisher Scientific 15140122), 0.1 mM 
2-Mercaptoethanol (Sigma-Aldrich, M3701), 
1 ug/ml heparin (STEMCELL Technologies, 
07980). The dissociated cells were reconsti- 
tuted with the neural induction medium and 
plated at 10,000 cells per well in a 96-well v- 
bottom ultra-low-attachment plate (Sumitomo 
Bakelite, MS-9096V). To increase the cell sur- 
vival and aggregate formation, 10 uM Y-27632 
was added for the first day. For cortical or- 
ganoids (COs), cells were cultured with the 
neural induction medium supplemented with 
100 nM LDNI193189 (STEMCELL Technologies, 
72147), 10 uM SB431542 (Sigma-Aldrich, $4317), 
and 2 uM XAV939 (TOCRIS, 3748) for the first 
4 days and then with the medium supple- 
mented with 100 nM LDN193189 and 10 uM 
SB431542 for the next 4 days. For medial gan- 
glionic eminence organoids (MGEOs), cells were 
cultured with the neural induction medium 
supplemented with 100 nM LDN193189, 10 uM 
SB431542, and 2 uM XAV939 for 8 days (90). 
After 8 days, both COs and MGEOs were trans- 
ferred to a 6-well ultra-low-attachment plate 
(Corning, CLS3471) and cultured with organoid 
growth medium on an orbital shaker (Thermo 
Fisher Scientific, 88881101) rotating at a speed 
of 90 rpm to enhance the nutrient and gas 
exchanges. COs were grown in organoid growth 
medium composited with 50% (v/v) DMEM/ 
F12, 50% (v/v) Neurobasal medium, 1% (v/v) N2, 
2% (v/v) B27 minus vitamin A, 1% (v/v) MEM 
non-essential amino acid, 1% (v/v) GlutaMAX, 
1% (v/v) Penicillin/Streptomycin, 0.1 mM 2- 
Mercaptoethanol, 2 ug/ml heparin, 2.5 ug/ml 
human insulin (Sigma-Aldrich, 19278), and 
200 ng/ml laminin (Thermo Fisher Scientific, 
23017015). For MGEOs, 1X B27 plus vitamin A 
(Thermo Fisher Scientific, 17504044) was used 
for the organoid growth medium, and human 
SHH (R&D System, 1845- SH/CF) and 1 uM 
purmorphamine (TOCRIS, 4551) were addi- 
tionally added for ventral specification. From 
day in vitro (DIV) 21, organoids were cultured 
with a neuronal maturation medium compos- 
ited with Neurobasal medium, 1% (v/v) N2, 2% 
(v/v) B27 plus vitamin A, 1% (v/v) MEM non- 
essential amino acid, 1% (v/v) GlutaMAX, 1% 
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(v/v) Penicillin/Streptomycin, 0.1 mM 2-Mer- 
captoethanol, 2 ug/ml heparin, 1% (v/v) Che- 
mically defined lipid concentrate (Thermo 
Fisher Scientific, 11905031), 10 ng/ml BDNF 
(R&D System, 248-BD-025; or Peprotech, 
450-02), 10 ng/ml GDNF (Peprotech, 450-10), 
10 ng/ml NT3 (R&D System, 267-N3-025; or 
Peprotech, 450-03), 200 uM cAMP (Sigma- 
Aldrich, D0627) and 200 uM ascorbic acid 
(Sigma-Aldrich, A92902). For the regionali- 
zation of human COs and MGEOs, 100 ng/ml 
RSPO3 (R&D System, 3500-RS/CF) or 100 ng/ml 
FGF8b (R&D System, 423-F8/CF) were added 
to the organoid growth medium from day 8 
to day 21. For the GAL and GALP treatment, 
macaque COs were exposed to 30 ng/ml of 
human GAL (Phoenix Pharmaceuticals, 026- 
01), 30 ng/ml human GALP (Phoenix Pharma- 
ceuticals, 026-51), or both from DIV 46 to 60 
and collected on DIV 60. Human COs and 
MGEOs were exposed to 30 ng/ml GALP from 
DIV 8 to 21 and collected on DIV 35. Brain 
organoids were collected and fixed in 4% PFA 
for 24 hours at 4°C. Then, organoids were im- 
mersed in stepgradients of sucrose/PBS up to 
30% for 2 days at 4°C, embedded in OCT and 
frozen at -80°C. Sections were prepared at 
12 um on a Leica CM3050S cryostat and stored 
at -80°C until use. 


Immuno-cytochemistry of the organoids 


Immuno-cytochemistry of the human and 
monkey telencephalic organoids was started 
first rehydrating frozen slides in PBS. Block- 
ing was done 1 hour RT in PBS- 10% normal 
donkey serum (Sigma-Aldrich, D9663) plus 
0.5% tween-20. Incubation with primary 
antibodies was performed in PBS- 10% normal 
donkey serum plus 0.5% tween-20 at 4°C, ON. 
The following primary antibodies were used at 
the concentration indicated: PAX6 (BioLegend, 
PRB-278P; 1:200); NKX2-1 (ABCAM, ab76013; 
1:500); LMX1A (ABCAM, ab76013, 1:200); SP8 
(ABCAM, ab73494; 1:200); ZIC4 (Lsbio, LS- 
B9905-50; 1:100); Galanin (Millipore Sigma, 
AB2233; 1:100); GALR2 (ABCAM, ab188753; 
1:100); GALP (Novus biological, NBP2-84950; or 
Thermo Fisher Scientific, BS-11526R; 1:100); 
SOX2 (R&D Systems, AF2018; or MAB2018; 
1:200); HuC/D (Thermo Fisher Scientific, A- 
21271; 1:500); KI67 (ABCAM, ab15580; 1:200); 
GABA (Sigma-Aldrich, A2052; 1:100); CTIP2 
(ABCAM, ab18465; 1:1000). Secondary anti- 
body incubation was performed in PBS 1 hour 
RT. Secondary antibodies were Alexa Fluor 
488-, 594-, or 64’7- conjugated AffiniPure Donkey 
anti-IgG (1: 200; Jackson ImmunoResearch). 
Nuclei were counterstained with DAPI (Sigma, 
D8417). Finally, the slides were mounted using 
Vector mounting medium. 


In utero intraventricular injection 


Pregnant C57/BL6 mice were injected with 
pre-emptive analgesic Ethiqua (3.25 mg/k BW), 
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30 min before surgery, followed by injection of 
Ketamine-Xylazine mixture at 100 mg/kg and 
10 mg/kg B/W, respectively. After 10-15 min 
animals were insentient, they were put dor- 
sally on a sterile surgical cloth, kept over a 
heated pad (42°C cycle of 30 min ON/OFF) 
throughout the procedure and limbs were fixed 
on the cloth using tape. Then, the ventral 
region of the animals were cleaned with an 
alcohol pad. A straight incision was made 
(~2-3 mm) using a fine scissor. A surgical 
gauze with acircular hole (2-3 mm diameter) 
was soaked with sterile saline and kept over 
the site of incision. Using ring forceps, E11.5 
embryos were taken out from the belly re- 
gion and kept on gauze. In utero injection 
was performed with 100 ng of GALP peptide 
in InL (Phoenix Pharmaceuticals, 026-51; stock 
concentration: 1 ug/ml in PBS, diluted 1:10 in 
PBS + Fast Green 0.1%) injected into the lateral 
ventricle of the telencephalon through a glass 
capillary needle. The injection was seen going 
through the later ventricle with the Fast Green. 
After injection, the uterus was placed back in 
the peritoneal cavity and wound was sutured. A 
triple antibiotic paste was applied to the site of 
suture and the analgesic Meloxicam (5 mg/kg 
B/W) was injected in the animal. The animals 
were placed on a heated pad (42°C) under close 
monitoring until fully recovered. Embryos were 
harvested 24 hours after the injection. Two 
hours before harvesting the embryos, 100 mg/ 
KG BW of EdU (Invitrogen, C10337) dissolved in 
saline solution was injected intra-peritoneally 
in the pregnant animals. Then the mice were 
sacrificed and the embryos collected and fixed 
as described above. 


Immunohistochemistry of the mouse 
brain tissue 


Serial coronal sections (15 um) were collected 
from the frontal to the caudal telencephalon. 
Each slide contained multiple serial sections 
of the mouse brain. Immuno-histochemistry 
(IHC) was performed initially rehydrating the 
frozen glass slides in PBS. Then the tissue was 
incubated with blocking solution, 10% normal 
donkey serum (NDS)/PBS including 0.1% 
tween-20 and 0.2% Triton X-100. Primary and 
secondary antibody incubations were diluited 
in blocking solution. Primary antibodies and 
dilutions: anti-goat Sox2 (R&D system, AF2018), 
dil: 1:200; anti-rabbit Ki67 (ABCAM, ab15580), 
dil:1:200. Secondary antibodies were Jackson 
DyLight donkey anti-(species) 488, 543, 64/7, 
and DAPI nuclear staining was Vector mount- 
ing medium H-1200. EdU staining (Invitrogen, 
C10337) was performed following the man- 
ufacturer’s protocol. Briefly, After drying the 
sections at room temperature for 10 min, the 
slides were in 0.5% Triton X100 for 20 min, 
then washed twice in 3% BSA for 10 min. Next, 
500 ul of reaction cocktail was added on each 
slide for 1 hour at RT. After that, slides were 
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washed in 3% BSA and 1 ml of DAPI (0.1 ug/ml) 
was added for 5 min and mounted with Vector 
shield mounting media. 


Single-molecule RNA in situ hybridization 


RNA in situ hybridizations were performed by 
Advanced Cell Diagnostics, Newark, CA, using 
RNAscopeTM technology. Paired double-Z 
oligonucleotide probes were designed against 
target RNA using custom software. The probes 
used for rhesus macaque and mouse brain 
tissue samples are shown in table S11. RNA- 
scope LS Fluorescent Multiplex Kit (Advanced 
Cell Diagnostics, Newark, CA, 322800) was used 
with custom pretreatment conditions following 
the instruction manual. Fixed frozen monkey 
and mouse fetal brain tissue slides were ma- 
nually post-fixed in 10% neutral buffered forma- 
lin (NBF) at room temperature for 90 min. Then 
the slides were dehydrated in a series of ethanols 
and loaded onto the Leica Bond RX automated 
stainer, performing the reagent changes, start- 
ing with the pretreatments (protease), followed 
by the probe incubation, amplification steps, 
fluorophores, and DAPI counterstain. RNAscope 
2.5 LS Protease III was used for 15 min at 40°C. 
Pretreatment conditions were optimized for 
each sample and quality control for RNA 
integrity was completed using probes specific 
to the housekeeping genes Polr2a, Ppib, and Ubc, 
which are low, moderate, and high expressing 
genes, respectively. Negative control background 
staining was evaluated using a probe specific to 
the bacterial dapB gene. Coverslipping was 
done manually using ProLong Gold mount- 
ing media at the end of each run. 


Microscopy and imaging 


Fluorescent monkey and mouse brain tissue 
specimens and organoids sections were imaged 
using a Zeiss LSM800 confocal microscope, or a 
Zeiss 510 Meta confocal microscope. Z-stack 
and tiled confocal images of brain slices and 
organoids were processed using Zeiss ZEN2009 
and ImageJ (v.2.0.0-rce-69/1.52p). Slight artefac- 
tual defects of DAPI intensity were manually 
corrected with imageJ. When necessary, fluo- 
rescence intensity or contrast was slightly ad- 
justed using the same parameters for all the 
specimens. 


High-throughput quantification 


Multiple images of the organoids and mouse 
sections were analyzed in batch using Volocity 
(v.6.3.1). The fluorescence intensities of all the 
channels were analyzed at single cell level ob- 
taining the segmentation of the objects based 
on DAPI. Then Spotfire (v.12.3.0, TIBCO) soft- 
ware was used to identify the cells expressing 
high levels of a specific protein, setting the 
average or the third quartile (Q3) intensities 
of all objects as threshold. The final statistical 
analysis and visualization were performed 
using GraphPad Prism9. 
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INTRODUCTION: The adult human brain is di- 
vided into hundreds of spatial domains, each 
comprising tens or hundreds of distinct neuro- 
nal, glial, and other cell types. This complex 
arrangement of cells is initially established 
during the first trimester of development, yet 
the difficulty of accessing such early embryos 
has hindered detailed molecular analysis. Dis- 
secting the spatial, temporal, and transcrip- 
tional changes that occur in the whole brain 
during the first trimester promises to reveal the 
fundamental blueprint of the human brain. 


RATIONALE: To comprehensively map brain 
cell types and gene expression trajectories dur- 
ing the first trimester, we collected 26 brain 
specimens spanning 5 to 14 postconceptional 
weeks (pcw) that were dissected into 111 dis- 
tinct biological samples. Each of these samples 
was subjected to single-cell RNA sequencing, 
resulting in a collection of 1,665,937 high- 
quality single-cell transcriptomes. These data 
were complemented by a spatial transcrip- 
tomic analysis at 5 pew using highly multi- 
plexed RNA fluorescence in situ hybridization 
(FISH) and spatial transcriptomics. We iden- 
tified 616 clusters, which we annotated with 


Patterning 


14 weeks 


the first-trimester 


Ka Wai Lee, Elin Vinsland, Peter Lonnerberg, 
a, Joakim Lundeberg, Roger A. Barker, 


metadata, including class and subclass, spatial 
location, embryonic age distribution, and spe- 
cific gene expression markers. 


RESULTS: The detailed resolution of the data- 
set allowed us to characterize general princi- 
ples of brain development as well as delineate 
the differentiation trajectories of several brain 
regions. The developing excitatory neuron line- 
ages in the neocortex revealed three different 
ongoing molecular programs: differentiation 
from radial glia to neurons, cell cycle, and matu- 
ration. We found a delicate balance between 
progenitor and differentiation factors in inter- 
mediate progenitor cells (IPCs), with the in- 
duction of neurogenic transcription factors 
visible after the G, cell cycle phase. Our find- 
ings support a conserved progressive transcrip- 
tional maturation in older specimens. Many 
genes were induced in late radial glia and 
glioblasts, making up a program that drives 
progenitors toward neurogenesis as well as 
gliogenesis. In the forebrain y-aminobutyric 
acid-mediated (GABAergic) neuronal lineage, 
we found evidence of migration of CRABP- 
expressing cells from the medial ganglionic 
eminence into the thalamus, which are pre- 


in the adult. Examining ventral midbrain du. 
opment, we found a diverse set of progenitors 
already arising at 8 pew, defining broad TH 
class identity, although adult TH subtype iden- 
tities must arise after 14 pcw. Focusing on 
developing glia, we found a large set of region- 
specific glioblasts, of which most showed evi- 
dence of maturation into astrocytes. This 
provides a plausible mechanism for the spec- 
ification of adult region-specific astrocyte 
types. We further identified oligodendrocyte 
precursor cells (OPCs) specific to the forebrain, 
midbrain, and hindbrain that expressed large 
numbers of functionally conserved genes. 


CONCLUSION: Although previous studies have 
explored specific regions of the brain during 
development, this is the first known compre- 
hensive study of the whole human brain during 
the crucial first trimester. We found that al- 
though neurons were the most diverse, both 
pre-astrocytes and OPCs were regionally dis- 
tinct, and their gene expression suggests region- 
and cell type-specific supportive functions. 
These findings highlight the importance of 
early patterning events and provide a rich re- 
source for the interpretation of the many brain 
disorders that show region-specific patterns of 
occurrence or severity and for identifying thera- 
peutic targets for human disorders that affect 
specific brain cell populations. 
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Comprehensive cell atlas of the first-trimester 
developing human brain 
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BRAIN CELL CENSUS 


The adult human brain comprises more than a thousand distinct neuronal and glial cell types, a diversity 
that emerges during early brain development. To reveal the precise sequence of events during early 
brain development, we used single-cell RNA sequencing and spatial transcriptomics and uncovered cell 
states and trajectories in human brains at 5 to 14 postconceptional weeks (pcw). We identified 12 major 
classes that are organized as ~600 distinct cell states, which map to precise spatial anatomical domains 
at 5 pew. We described detailed differentiation trajectories of the human forebrain and midbrain and 
found a large number of region-specific glioblasts that mature into distinct pre-astrocytes and pre- 
oligodendrocyte precursor cells. Our findings reveal the establishment of cell types during the first 


trimester of human brain development. 


he human brain develops by patterning 

of the neural tube through specification, 

differentiation, and maturation events that 

yield several thousand types of cells. Single- 

cell studies have explored specific regions 
during human brain development, particularly 
the neocortex (J-3). In this work, we performed 
RNA sequencing of single cells from 26 brain 
specimens spanning 5 to 14 postconceptional 
weeks (pcw) (fig. S1 and table S1). From 111 
unique biological samples, we retained 1,665,937 
high-quality cells (fig. S2 and Methods), which 
resulted in 616 clusters that we annotated 
with metadata, including class, subclass, spa- 
tial location, embryonic age distribution, and 
specific markers (table 82). 

Cells of the neuronal lineage dominated 
(Fig. 1, A to C, and fig. S3): radial glia (express- 
ing HES; fig. S3, A to C), neuronal intermediate 
progenitor cells (IPCs) (defined as neuroblasts 
and neurons expressing cell cycle genes), neuro- 
blasts (expressing NHLHT7), and immature neu- 
rons (expressing NA). Later, radial glia matured 
into putative glioblasts (defined by expression 
of TNC and BCAN) and oligodendrocyte pre- 
cursor cells (OPCs) (expressing PDGFRA and 
OLIG1). Non-neural cell populations included 
vascular cells (endothelial cells, pericytes), 
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erythrocytes, immune cells, fibroblasts, and 
cells derived from the placodes and the neural 
crest. 

The abundance of neurons and neuroblasts 
remained relatively constant, whereas the prev- 
alence of neuronal IPCs appeared to increase 
slightly by age, as did cells of the immune, 
oligodendrocyte, and vascular lineages (Fig. 1B). 
However, the major change over the time span 
studied was the switch from radial glia to glio- 
blast, around 10.5, 9.5, and 7.5 pcw in the fore- 
brain, midbrain, and hindbrain, respectively 
(Fig. 1B). 

Radial glia, neuronal IPCs, neuroblasts, and 
neurons all showed strong regional patterning 
(Fig. 1D and fig. S3, D and E). In addition, glio- 
blasts showed distinct subtypes corresponding 
to each developmental compartment, and OPCs 
were patterned along the anterior-posterior axis 
(Fig. 1D and fig. S3F). By contrast, immune, 
vascular, and blood cells lacked strong region 
specificity (fig. S3F). 

We spatially mapped cell types at 5 pcw 
using multiplexed RNA enhanced electric fluo- 
rescence in situ hybridization [EEL FISH (4)] 
targeting 440 selected genes in three relatively 
medial sagittal sections of a single embryo 
(Fig. 2, fig. S4, and table S3), complemented 
by transcriptome-wide spatial transcriptomics 
on eight sections sampled more widely (fig. 
S4E). We annotated anatomical domains (Fig. 
2B and fig. S4, A to C) and aligned 64 clusters 
to spatial regions using BoneFight (5) (Fig. 2C 
and fig. S4D). 

Quantification of gene expression along the 
anterior-posterior axis revealed the anatomi- 
cally restricted expression of many genes, where 
the domains largely coincided with anatomical 
landmarks (fig. S5A). Only few measured genes 
were restricted to a single prosomere at this 
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time point (SIX6, OLIG3), whereas most were 
expressed in combinations (Fig. 2A) or gra- 
dients (FOXGI, RSPO3, SIX3; fig. S5, A and B). 
Dorsoventral symmetry was observed for sev- 
eral genes mostly in the midbrain and isthmus 
(ENI, EN2, PAX8) but also between telence- 
phalon and diencephalon (FEZF2, EMX2), 
whereas other genes clearly broke this sym- 
metry (FOXGI, RSPO3, FGFI7, PAX5) (fig. S5B). 


Excitatory lineage of the developing neocortex 


We computationally isolated cortical excita- 
tory neurons by EMX7 expression (fig. S6A), 
revealing a single main lineage, as well as a 
minor branch of Cajal-Retzius cells (Fig. 3, A 
and D, and fig. S6, B to G). Radial glia and IPCs 
each formed loops on the two-dimensional 
embedding, which corresponded to the cell 
cycle. Superimposed on each cell cycle loop 
was an inside-out maturation gradient, with 
older cells located at the outside of the loop 
(Fig. 31). Each postconceptional week showed 
similar patterns (Fig. 3B and fig. S6B). Radial 
glia, IPCs, and neuroblasts were already de- 
tected at 5 pcw, whereas neurons and glio- 
blasts were not detected until 6 pcw (Fig. 3, B 
and C, and fig. S6C). 

At all ages, we observed short trajectories 
bridging radial glia to IPCs and IPCs to dif- 
ferentiating neurons, in each case linking cells 
right after mitosis of a previous stage to the 
beginning of the next stage (Fig. 3, A and D, 
and fig. S6B). (Fig. 3D and fig. S6G). Noncycling 
IPCs might plausibly correspond to those radial 
glia that were differentiating directly into 
neurons. Thus, progenitors commit to another 
division, or progress to the next stage of dif- 
ferentiation, just after dividing. 

A recent study (6) identified two transcrip- 
tional subtypes of IPCs: radial glia-like IPCs 
that express SOX2 and neuronal-like IPCs that 
express neuronal genes. To explore these sub- 
types, we resolved the cell cycle trajectory for 
IPCs in samples from two specimens (8.5 and 
14 pew) using DeepCycle (7), defining an offset 
of each cell along the cell cycle periodic tra- 
jectory (Fig. 3E and fig. S6, H to J). The ex- 
pression of the neuronal genes NEUROD6 
and BCLIIB was increased after late G,, con- 
sistent with previous findings (8), whereas 
SOX2 was down-regulated after the S phase 
(Fig. 3, E and F, and fig. S6, K to L). These pat- 
terns were also observed in other ages (fig. 
S6M). Reembedding the EM/X7-expressing IPCs 
after removing the cell cycle effect (see Meth- 
ods) revealed opposite trends for NEUROD6 and 
SOX2 (fig. S6é, N and O). NEUROD6" neuronal- 
like cells showed higher RNA molecule counts 
and a larger number of expressed genes in 
most of the examined conditions when com- 
pared with SOX2* radial glia-like cells (Fig. 3G 
and fig. S6, P to R). 

In the neuronal-like IPCs, 138 genes were 
up-regulated, including known neuronal and 
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Fig. 1. Overview of the developmental timecourse across the embryonic- 617 clusters, including age (using saturation to show distribution, normalized 
fetal brain. (A) A t-distributed stochastic neighbor embedding (tSNE) plot per cluster), regional distribution (using color saturation to show fractions, 
showing major cell classes. (B) (Top) The proportion of cell classes by age, normalized per cluster), common cell type markers (using linear color 
corrected for sampling by morphometric volume normalization (see Methods). gradients to represent linear expression from minimum to the maximum value 


Data for 13 and 14 pcw were omitted because of incomplete sampling at those — per gene; zeros are shown in gray), and class [colored as in (A)]. (D) tSNE 
ages. (Bottom) A schematic representation of major cell types in the study and _ plots showing the regional identity of cells belonging to each major class. 
their progression through differentiation (left) and proportion of glioblasts and Embryo image in (A) was used with permission from the HDBR Atlas 

radial glia by developmental age and brain region (right). (C) Metadata for all (https://hdbratlas.org). 


Braun et al., Science 382, eadf1226 (2023) 13 October 2023 2 of 14 


SPECIAL SECTION BRAIN CELL CENSUS 


RNA molecules 
A Gene expression 


Pg. 2 Te OW @OLIG3 @ NKX2-2 
NOR @STMN2 @ EMX2 
| Rhope @ PAX6 @WNT2B 
bie @ SHH @ FOXG! 
@ NKX2-1 
@ EN2 
“ey @ WNTI 
ZA © SIX6 
4, © FGF8 
%, @ FGFI7 


@ BARHL1 


Subventricular — pee 
Mantle 


q, Ventricular —  s.bry 
Cc 
[e) 
N 


200M @sox2 NALH1 @Naxnz @sTunz2 
i Pallium ® Midbrain tegmentum 
lH Anteromedial cerebral pole ® Midbrain tectum 
® Subpallium ®@ Preisthmic tectum 
2 7) [MM Terminal hypothalamus i Isthmus 
% ‘f 4 i Optic recess  Prepontine hindbrain (r1, r1{r2, r2) 
%, % i ® Lamina terminalis ® Pontine hindbrain (r3, r4) 
AW, & W Peduncular hypothalamus ® Pontomedullary hindbrain (r5, r6) 
Cc Cell types oy fey !) Diencephalon tectum-thalamus ff Medullary hindbrain (r7-r11) 
yp %  Diencephalon tegmentum M@ Total RNA molecules 
pa ON, 


570) 


Fig. 2. Spatial gene expression at 5 pew. (A) Scatter plot showing mRNA molecules detected in the 
neural tube of a single sagittal section, colored by gene identity (molecules outside the neural tube are not 
shown). Prosomeres are indicated, and approximate gene expression domains are shown as ribbons. M.H.B., 
midbrain-hindbrain boundary. (B) Manually curated anatomical domains following the prosomeric model 
plotted on top of the total RNA of the tissue (gray). The inset shows the expression of markers of the 
ventricular, subventricular, and mantle zones that are shown for a segment of the ventral hindbrain. (C) (Left) 
Single-cell clusters assigned to spatial domains using BoneFight. (Right) Heatmap showing the expression of 
the same genes as in (A) across aligned clusters. The perceptually linear colormap shows linear gene 
expression from the minimum to the maximum value per gene, with zeros shown in gray. MZ, mantle zone; 
SVZ, subventricular zone; VZ, ventricular zone. 


differentiation-related markers BCLIIB, STMN2, | these genes were also up-regulated in radial 


TBRI, and NEUROD2. Many of these genes also 
exhibited higher expression in neuroblasts and 
neurons (fig. S6, S to U, and table S4). In the 
radial glia-like IPCs, 200 genes were up- 
regulated, including SOX21, FGFR2, and FOXP1, 
which promote cell cycle reentry (9-17); many of 
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glia. These observations are consistent with the 
“cell cycle length model” of differentiation (72). 

Next, to understand the maturation axis of 
cortical development (Fig. 31), we used Milo 
(see Methods) to define the differential abun- 
dance of cells from either early or late ages 
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between small neighborhoods of cells (13) 
(Fig. 3J). Hundreds of genes changed signif- 
icantly in each class between these states 
(Fig. 3, K and L; fig. S7, C and D; table S5; and 
Methods). Although most of the genes were 
differentially expressed in one or two class types 
(fig. S7C), 28 genes showed higher expression 
in either the early or late state throughout the 
lineage (Fig. 3L). These include 77R, DDIT4, and 
LEF1] in the early state and FABP7, POU3F2, 
ATRALI in the late state. 

Seven hundred twenty-eight genes were dif- 
ferentially expressed between early radial glia 
and late radial glia or glioblasts. Most were 
associated with late radial glia and were en- 
riched in genes related to extracellular matrix, 
focal adhesion, and axon guidance (fig. S7E 
and table S6). After removing the cell cycle 
effect, reclustering resulted in two distinct clus- 
ters representing early and late radial glia 
and glioblasts (fig. S7F), further supporting a 
substantial transcriptional change between these 
cells, as has been previously observed in mice 
(5, 14). Mouse (14) and human early- and late- 
related genes overlapped extensively, especially 
for the late genes (Fig. 3M). Notably, some of 
these conserved genes were previously defined 
as basal radial glia markers, including TNC, 
HOPX, and FAMI07A (15). Because basal radial 
glia are rare in mice, our findings suggest that 
these genes are part of a broader late radial 
glia- and glioblast-related program and not 
specific to basal radial glia. 


Development and migration of forebrain 
GABAergic neurons 


At the tips of the DLX2* medial ganglionic 
eminence (MGE) and caudal ganglionic emi- 
nence (CGE) trajectories, we identified neu- 
rons that were dissected from the cerebral 
cortex and the hippocampus, indicating that 
they had migrated there, which is a finding 
consistent with previous studies (J6) (figs. S8A 
and S9). Cortical MGE-derived neurons were 
detected at 6 pcw and onward, whereas CGE- 
derived neurons were observed starting at 
9 pew (fig. S9, H and J). 

When we broadened the scope to clusters 
that expressed DLX2 from the whole brain 
(fig. S8, B to D, and Methods), we found that 
at 6 and 8 pew, both hypothalamus and thala- 
mus contained DLX2-positive cells, but most 
did not express FOXGI and thus were likely 
generated locally (figs. S8, B and C, and S10, 
A to E). The progenitors at 6 pcw expressed 
FOXDYI', which suggests that they correspond 
to a mouse population that gives rise to Soxvl4- 
Pvalb* thalamic interneurons (17). 

At 14 pew, however, the hypothalamus sam- 
ple contained glioblasts that expressed FOXG1 
and SPARCLI and postmitotic cells that ex- 
pressed DLX2 and SST (figs. SSD and S10, F 
and G); these might be hypothalamus-specific 
cells that originated in the telencephalon. 
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Fig. 3. The excitatory neuron lineage in the developing neocortex. 

(A) Uniform manifold approximation and projection (UMAP) of pallial excitatory 
neuron lineage colored by major cell classes. (B) UMAP of selected ages colored 
by major cell classes. (C) Distribution of major cell classes by age. (D) UMAP 
colored by progenitor state. (E) (Left) Progenitor states for neuronal IPCs 

and early neuroblasts at 8.5 pcw inferred by DeepCycle transcriptional phase. 
(Right) Unique molecular identifier (UMI) counts per cell as a function of the 
transcriptional phase. Cells expressing NEUROD6 are colored brown. The vertical 
dashed line labeled “M” indicates apparent mitosis. The inset is a small box plot 
that shows up-regulation of NEUROD6 after G; (one-sided Mann-Whitney- 
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Wilcoxon test with Bonferroni correction; ****P < 0.0001). (F) UMAP colored 
by SOX2 and NEUROD6 expression. (G) Box plot comparing UMI counts in radial 
glia (RG)-like IPCs versus neuronal-like IPCs in Chromium version 3 samples. 
****P < 0.0001; ns is not significant. (H) Schematic showing that the emergence 
of neuronal-like IPCs happens late or after G;. These cells differentiate after 
mitosis into two neuroblasts, whereas radial glia—like IPCs continue to 
proliferate. (I) UMAP colored by age. (J) Differential abundance across age. For 
Milo neighborhood embedding, each point represents a neighborhood, and the 
size of the points is proportional to the number of cells in the neighborhood. 
Neighborhoods are colored by their log-fold change in abundance between early 
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and late ages. (K) (Left) Examples of early- and late-related genes. (Right) Ratio 
of gene-expressing cells (radial glia for HOPX and neuroblasts for NHLH2) by 
age. (L) Examples of early- and late-related genes throughout the lineage 

(M) Overlap between human and mouse early (left) and late (right) radial glia— 
associated genes. (N) Differentially expressed genes between early and late 


Similarly, the thalamus sample at 14 pcw con- 
tained many cells that expressed MGE marker 
genes and FOXGI (figs. S8D and S10, F to I). 
We had only obtained one thalamic specimen 
at this stage, and hence we cannot rule out the 
possibility that these cells were misdissected. 
However, our results are consistent with a pre- 
vious report of neuronal migration from the 
MGE to the thalamus between 15 and 26 weeks 
of gestation in humans (J8). Moreover, after ex- 
amining recently described MGE-derived thala- 
mic interneurons in the adult human brain (19), 
we confirmed the presence of MGE-derived 
thalamic PVALB” neurons expressing FOXGI. 
Thus, we further suggest that these cells give 
rise to PVALB’ interneurons in the thalamus. 


The developing human ventral midbrain and 
dopaminergic neurons 


From 17 midbrain samples (5 to 14 pew), 
246,500 neural cells were collected and anno- 
tated (Fig. 4A and fig. S11, A to D). To define 
the ventral midbrain, FOXA2*;FOXAI* and/or 
TH" clusters and their neighbors were selected 
and reclustered (Fig. 4, B and C). Cell cycle 
score and RNA Velocity defined prolifera- 
tive cells and three major axes of differen- 
tiation, respectively (Fig. 4C and fig. SIE). 
Each axis followed a consistent sequence of 
ages and cell types (progenitor-neuroblast-neu- 
ron; Fig. 4D and fig. S11, F to H). The floor plate 
(FOXA2" ;LMXI1A‘), which gives rise to dopa- 
minergic neurons, contained NGN2" neuronal 
progenitors and HEPACAM2';WNTT’ postmi- 
totic medial neuroblasts, whereas the basal 
plate (FOXA2*;NKX"), which generates red 
nucleus and motor neurons, contained NGNI* 
and NGN2* progenitors, as well as HEPACAM2", 
but not WNTT', neuroblasts (Fig. 4, E, G, and H, 
and fig. S11, I to O). The floor plate could be 
further subdivided in anterior (PITX2") and 
posterior (EN7") sections (Fig. 4, F to H), con- 
sistent with cells that belong to the subtha- 
lamic or midbrain dopaminergic lineages (Fig. 
4, I to K, and fig. S11, P to T). SOX6 was ex- 
pressed along the entire LVX1A";FOXA2";ENT™ 
lineage. Instead, only a fraction of TH” neu- 
rons were CALBI", most of which were SOX6- 
negative (Fig. 4L). Reclustering of TH™ neurons 
revealed 20 distinct subtypes expressing well- 
known dopaminergic markers and distinctive 
profiles (Fig. 4, M and N, and fig. S11U). Adult 
TH” neuron class markers [SOX6, CALBI1, and 
GAD2 (19)| emerged in a largely complementary 
pattern by 5 to 8 pcw (Fig. 4, O and P). However, 
the 14 adult TH™ neuron subtype signatures 
shown in a complementary study (19) were 
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not found in embryonic TH” neurons, which 
expressed CXCR4, underlining their immature 
migratory nature (Fig. 40). Thus, whereas em- 
bryonic 7H subtype identity and adult broad 
TH class identity are defined by 8 pcw, adult 
TH subtype identities emerge later. 


Region-specific glioblasts and pre-astrocytes 


BCAN (20) and TNC together identified puta- 
tive glioblasts in all brain regions (Fig. 1D), 
although the transition from radial glia to 
glioblast appeared gradual rather than sudden 
(Fig. 3). We reembedded 44: glioblast clusters 
together with oligodendrocyte lineage cells 
to reveal their transcriptional relationships 
(Fig. 5 and fig. $12). All types of glioblasts 
emerged earlier in the hindbrain, followed 
by emergence in the midbrain and forebrain 
(Fig. 5C). They were observed from 6 pcw, 
and their heterogeneity was dominated by spa- 
tial rather than temporal differences (Fig. 5A 
and fig. S12B). 

About half of all glioblasts additionally ex- 
pressed the astrocyte-specific water channel 
AQP4 and tight-junction Connexin-43 (encoded 
by GJAI), which we defined as pre-astrocytes 
(fig. S12D). Pre-astrocytes were found in the 
diencephalon, midbrain, and hindbrain and 
retained region-specific transcriptional iden- 
tity. Telencephalic pre-astrocytes likely develop 
after the latest time point sampled in this work, 
as suggested by low AQP4 expression in late 
glioblasts from that region (fig. S12D). 

One glioblast cluster (fig. S12A) represented 
pre-OPCs, based on the expression of EGFR 
and DLL3 (20), and it included cells from 
all major brain regions. Compared with pre- 
astrocytes and pure glioblasts, pre-OPCs emerged 
later and in smaller numbers (Fig. 5B). 


OPCs are defined by their developmental 
tissue origin 


Adult oligodendrocytes are morphologically 
and transcriptionally diverse. In this work, we 
found that their developing progenitors, OPCs, 
exhibited distinct anteroposterior transcrip- 
tional identities (Fig. 6A). In the hindbrain, 
OPCs emerged by 6 pew, followed by a gradual 
increase in the midbrain and forebrain around 
8 pcw (Fig. 6B), which is consistent with, and 
even 2 weeks earlier than, what was recently 
found in the forebrain (20, 27). Commited OPCs 
(COPs) appeared predominantly in the hind- 
brain as early as 6 pcw (Fig. 6B). At the earliest, 
we found OPCs at 5.5 pew in the hindbrain, 
which is a sign of an earlier maturation along 
the oligodendrocyte lineage than has previously 
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states. Selected genes for every class are shown. Significant changes are marked 
gray. Mean expression values are scaled for each gene between O and 1. For box 
plots in (E) and (G), the center line represents the median, box limits are upper and 
lower quartiles, and whiskers extend to show the rest of the distribution, except 
for points that are determined to be “outliers.” 


been reported in humans. By contrast, no ma- 
ture oligodendrocytes were observed, suggest- 
ing that OPCs and COPs are arrested in their 
lineage progression and potentially serve a 
function other than myelination during this 
critical developmental period. 

Developmental patterning genes were diffe- 
rentially expressed between regions for both 
OPCs and COPs (see Fig. 6, C to E; fig. S13C; 
table S7; and Methods), including FOXGI 
(forebrain), EN (midbrain-hindbrain bound- 
ary), and HOXD3 (hindbrain) (Fig. 6D and fig. 
S13, C and D). Additional differentially ex- 
pressed genes coded for transcription factors, 
ion channels, cell adhesion molecules, and 
synaptic proteins (Fig. 6, D and E; fig. S13C; 
and table S7). Although both clusters of cycling 
OPCs were dominated by cell cycle genes, 
they retained a transcriptional forebrain- and 
hindbrain-specific profile (fig. S13, B and F), 
which suggests that there are multiple sources 
of proliferating OPCs that give rise to subse- 
quent maturing OPCs. 

Region-specific genes in radial glia, OPCs, 
and neurons were predominantly distinct. For 
example, of the 225 forebrain-specific genes 
in OPCs, only 22 were shared with radial glia, 
only 33 were shared with neurons, and only 
13 were common to all three classes (Fig. 6E), 
with similar patterns observed in the mid- 
brain and hindbrain. For OPCs, some of these 
genes included those that encode known 
patterning transcription factors, genes such as 
FOXGI (forebrain) and GATA3 (midbrain) as 


& 


well as HOX genes (hindbrain). Anumber of , 


long noncoding RNAs (for example, LINCOI551 
in the forebrain, OTX2-AS in the midbrain, and 
LINC02381 in the hindbrain) were also com- 
mon to all classes, likely because of their ge- 
nomic proximity to patterning genes. Thus, a 
few canonical patterning genes underlie the 
regionalization of radial glia, OPCs, and neu- 
rons, whereas most region-specific genes are 
specific to each class, reflecting cell type-specific 
functions in each region and subregion. 


Conserved and divergent properties of OPCs 


We wondered whether the transcriptional het- 
erogeneity of OPCs was conserved between spe- 
cies. Despite differences in sampling depth, the 
experimental methods used, and clustering al- 
gorithms, we were able to match OPCs, cycling 
OPCs, and COPs to the corresponding cell states 
in our previous developmental mouse dataset 
(5) (Fig. 6F and fig. S13E). To enrich for func- 
tionally relevant genes, we first identified those 
genes that were differentially expressed between 
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Fig. 4. Ventral midbrain and dopaminergic neurons. (A and B) tSNE plot 

of midbrain neural cells with annotated compartment, nuclei, and gene 
expression (A) or FOXA2/FOXAI" or TH" cells and neighbors (B). (©) UMAP with 
clusters and RNA velocity of ventral midbrain (VM) cells selected from (B). 

(D) Identity prediction by logistic regression trained on VM cell types in (34). 
Colored cells, probability >60%; gray cells, probability <60%. (E and F) Scheme 
of gene expression along the ventral-dorsal (E) and anterior-posterior (F) axes 
in the mouse VM at embryonic days 10.5 and 11.5, respectively. (G and H) UMAPs 
with enriched expression of FOXA2;LMXIA or FOXA2;NKX genes (G) and 
FOXA2;LMX1IA;EN1 or FOXA2;LMXIA;PITX2 genes (H). (I. and J) Random walks 
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(100 steps and 15 closest neighbors) from the encircled cluster indicated by 
the asterisk to the encircled clusters without asterisks (inset) were isolated as 
representative of the EN1 (I) or PITX2 lineages (J), respectively. (K) Average 
log2 + 1 expression of genes in latent-time bins along the ENI, PITX2 or both 
lineages. (L) UMAP with FOXA2*:;LMXIA";ENI” cells enriched in SOX6, CALBI 
or both. (M) tSNE plot with TH* clusters. (N and O) Log2 + 1 expression of 
dopaminergic marker genes (N) or CXCR4 and TH™ class subtype markers 
(0). (P) tSNE plot with TH™ cells colored by age. In (G), (H), (L), (N), and 
(QO), darker color indicates higher expression and gray indicates 

negative cells. 
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Fig. 5. Glioblasts. (A) tSNE plot showing glioblasts, OPCs, and COPs, colored by region as indicated. 


Labels indicate the putative spatial position of clusters given the marker genes indicated. The dashed line 


approximately separates pure glioblasts (above) from pre-astrocytes (below) using AQP4 expression 
(fig. S11D). (B) Fraction of pure glioblasts, pre-astrocytes, and pre-OPCs detected by age (corrected for 


sampling; see Methods section Morphometric volume normalization). (C) Fraction of pure glioblasts, pre- 


astrocytes, and pre-OPCs detected by age and region [corrected for sampling as in (B)]. 


OPCs and COPs in each species and then com- 
pared the resulting gene sets of the species (Fig. 
6, E and F; fig. SIZE; and table S8). Of 211 
differentially expressed genes, 67 (32%) were 
shared between species (Fig. 6F and fig. S13F). 
Many prototypical marker genes that are known 
in the early oligodendrocyte trajectory were con- 
served, such as PDGFRA (OPCs), ENPP6 (COPs), 
and MAG (COPs) (Fig. 6E and fig. S13F). The 
remaining 144: genes in each species were not 
shared, and either their expression patterns 
diverged or the gene was not expressed in the 
other species (fig. S13F). For example, RIT2 
(OPCs) and BMP-binding endothelial regu- 
lator (BMPER) (COPs) clearly distinguished 
the two populations in humans as opposed 
to mice, where Cdol and Cbln2 labeled the 
same groups (Fig. 6G and fig. S13, F and G). 
BMP4 was enriched in COPs in both species 
but was more highly expressed in mice (fig. 
S13F). These findings suggest a species-specific 
mode of BMP signaling, which may regulate 
the rate of differentiation of OPCs. 
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Discussion 

Our findings come with important caveats. 
Because of the nature of human samples, it was 
impossible to obtain timed specimens, which 
resulted in uneven coverage of temporal pro- 
cesses. Often, not every region in a specimen 
was free of damage, which led to missing re- 
gions that could only be partially compensated 
for by other specimens of similar age. We used 
freshly dissociated cells, but the conditions 
and length of time before sample collection 
could not be controlled. To minimize the delay 
between sample collection and cell capture, 
we used coarse dissections, which sometimes 
may not have coincided perfectly with neuro- 
anatomical boundaries. 

The results nevertheless expand our knowl- 
edge, especially of glial cell development, and 
underscore the fundamental similarity between 
all major cell classes that emanate from the 
neural tube. Neurons, astrocytes, and oligo- 
dendrocytes all develop from region-specific 
radial glia and transition through intermediate 
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cell states—neuroblasts, glioblasts, neuronal IPCs, 
pre-astrocytes, and pre-OPCs—that retain most 
of that patterning. We also addressed the cen- 
tral question of when and how adult neuronal 
identity is acquired in the embryonic midbrain. 
TH’ class identity (TH*SOX6", TH*CALBI', 
and TH’ GAD2" classes) and elements of adult 
subtype identity (CUX2 or ALDHIAI expression) 
were progressively acquired by 5 to 8 pew, but 
embryonic subtypes differed from the adult, 
indicating a progressive acquisition of adult cell 
identity. 

We observed very similar patterns of region- 
specific glia in the embryonic brain when com- 
pared with the adult brain (79). Collectively, 
these data might suggest that the human brain 
makes use of highly regionally and locally spe- 
cialized neurons, supported by region-specific 
astrocytes and oligodendrocytes. The functional 
implications of this view remain to be un- 
covered, but it is worth highlighting that many 
disorders that implicate glia show strong region- 
specific patterns of occurrence. For example, 
pediatric high-grade gliomas show distinct his- 
tone mutations that are associated with occur- 
rence at specific neuroanatomical sites, and it 
would be interesting to evaluate whether region- 
specific glial cell types could explain some of 
these phenomena. 

Our cell census provides a rich resource for 
exploring and interpreting normal human brain 
development and diseased brain tissue, for as- 
sessing the fidelity of organoid models and the 
quality of reprogrammed cells used in cell re- 
placement therapy, and for identifying thera- 
peutic targets for human disorders that affect 
specific brain cell populations. 


Methods 
Human tissue collection 


We used fresh tissue from terminated preg- 
nancies. There are several important caveats. 
Because of the nature of the samples, it was 
impossible to obtain timed specimens, resulting 
in uneven coverage of temporal processes. Often, 
not every region in a specimen was free of 
damage, leading to missing regions that could 
only be partially compensated for by other 
specimens of similar age. We used freshly 
dissociated cells, but the length of time before 
sample collection and the ambient temperature 
and buffer conditions during that time could 
only be partially controlled. In order to mini- 
mize the delay between sample collection and 
cell capture, we used coarse dissections, which 
may sometimes not have coincided perfectly 
with neuroanatomical boundaries. 

The use of abortion material was approved 
by the Swedish Ethical Review Authority and the 
National Board of Health and Welfare (decisions 
2020-02074 and 2019-04595). For tissue col- 
lected at Karolinska Institute, patients seeking 
abortion at the gynecology clinic were asked 
about their interest in donating the aborted 
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Fig. 6. Regional heterogeneity in oligodendrocyte precursors. (A) tSNE plot 
of OPC and COP clusters. The inset shows OPCs (black) in the whole dataset. 
The dashed circle indicates the location of OPCs. (B) Fraction of OPCs and COPs 
across age per main region. Cells from 5.5 pew (red) and 6 pew (black) are 
marked on the tSNE plot. (©) OPCs and COPs colored by region. (D) Top 
differentially expressed genes per region for OPCs. Genes selected with a false 
discovery rate <10~*° are indicated by the dashed line. (E) Venn diagrams 

for each region show overlap between classes: radial glia, OPCs, and neurons. 
(F) (Top left) Comparison between OPCs and COPs in human and mouse. 
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(Bottom right) tSNE plots indicate clusters of OPCs, COPs, and cycling 

cells in both species. (Top right) Overlap between mouse and human of 
differentially expressed genes between OPCs and COPs (significance level) is 
Shown in a Venn diagram. (Bottom left) Expression (bar plots) of common gene 
markers for OPCs (PDGFRA), COPs (ENPP6), and maturing COPs (MAG) 
highlighted in miniature tSNE plots. (G) Expression of species-specific 
differentially expressed genes [human: R/IT2 (OPCs), BMPER (COPs); mouse: 
Cdol (OPCs), Cbin2 (COPs)], illustrated in bar plots and highlighted on 
miniature tSNE plots. 


tissue to research. Patients who agreed signed 
a written consent after receiving information, 
both written and oral, given by a physician or 
midwife. Age (postconception) of the embryos 
and fetuses was estimated using clinical infor- 
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mation (last menstrual period, ultrasound), true 
crown-rump-length, and anatomical landmarks. 
After the abortion, the tissue was immediately 
transported to the laboratory and dissected in 


ice-cold 0.9% NaCl solution. Most of the cal- 
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varium was cut away, and the brain was care- 
fully lifted out of the skull base after separating 
the medulla oblongata from the cervical spinal 
cord. For scRNA-seq, different CNS regions were 
dissected using anatomical landmarks (22) and 
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kept in ice-cold Gibco Hibernate-E medium 
(Thermo Fisher) until further processing. For 
Spatial analysis, brains were covered by Tissue- 
Tek O.C.T. Compound (Sakura Finetek) in cryo- 
molds, snap-frozen in a slurry of 2-methylbutane 
(Sigma-Aldrich) and dry ice, and stored at —80°C 
pending sectioning. 

For tissue collected in Cambridge, donated 
fetal tissue stored in Hibernate-E medium (Hib-E, 
Thermo Fisher Scientific A124'7601) was collected 
from the local maternity hospital soon after 
passing. Anatomical landmarks were identi- 
fied similar to the Karolinska samples, referring 
to present literature (23). The relevant parts of 
brain tissue were dissected in a class II hood 
on the day of collection and stored in Hib-E 
overnight in a fridge. The tissue was shipped 
to Sweden at a refrigerated temperature the 
next day, normally delivered 2 days after abor- 
tion. The procedure is covered under ethics 
REC: 96/085. Subsequent processing was done 
as described for Karolinska samples. 


Cell dissociation 


Brain tissue was processed around 6 to 48 hours 
after tissue collection, depending on the source 
(Cambridge/ Karolinska Hospital). Tissues that 
were not processed within the same day of 
collection were stored at 4°C in Hibernate E 
medium (Thermo Fisher) during transport or 
overnight. A carbogenated (95% O.4/5% COs) 
ice-cold Earle’s Balanced Salt Solution (EBSS) 
was used throughout the whole procedure. 
All brain regions were dissociated separately 
using the Worthington’s Papain Dissociation 
System (Worthington) (Protocols.io; https:// 
dx.doi.org/10.17504/protocols.io.xmbfk2n). 
Tissues were enzymatically digested at 37°C 
for 10 to 30 min (depending on the develop- 
mental time point, younger ages kept shorter), 
followed by regular trituration using fire- 
polished glass Pasteur pipettes. Cell suspen- 
sions were filtered through a 30-um cell strainer 
(CellTrics, Sysmex), centrifuged for 5 min at 
200g to obtain cell pellets, followed by careful 
removal of the supernatants and resuspen- 
sion in EBSS (Worthington), using as small 
volume as possible depending on tissue size 
and cell density. Cell concentrations were esti- 
mated using a counting hemocytometer (Burker/ 
Neubauer chamber) and diluted with EBSS until 
the desired concentrations were reached. All 
suspensions were kept on ice until the next step 
of loading the cells on the 10X Chromium chips. 


Single-cell RNA sequencing 


Single cells were captured using the droplet- 
based single-cell RNA sequencing platform 
Chromium (10X Genomics). Roughly half of 
the sampling was done using the Chromium 
Single Cell3 Reagent Kits Version 2 and the 
other half with Version 3 (see detailed sampling 
in table Sl). The reason was that the company 
updated its kits halfway through the project. Cell 
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suspensions were adjusted to concentrations 
between 800 and 1200 cells/ul, targeting 3000 
to 5000 cells per reaction. cDNA synthesis was 
performed with 12 PCR cycles, and the rest of 
the library preparation was performed accord- 
ing to the manufacturer’s instructions (10X 
Genomics, Illumina). All libraries were se- 
quenced on an Illumina NovaSeq 6000 (Read 
1: 26 bp; v2, 28 bp; v3. Read 2: 98 bp; v2, 91 bp; 
v3) using S4 to a target sequencing depth of 
100,000 reads/cell. Sequencing saturation 
was examined for each sample using preseq 
(https://github.com/smithlabcode/preseq). 
Any samples that were not saturated to 60% 
were sequenced more deeply as needed, using 
preseq predictions. 


FEL FISH 


Four hundred forty genes were selected to 
cover major cell types in the human develop- 
mental brain using single-cell RNA-seq data 
and literature (table S3). EEL FISH was per- 
formed as previously described (4) with a minor 
change in the digestion step, which was per- 
formed two times 10 min with a concentration 
of 1 U/ml of proteinase K. Three sections were 
processed that covered the medial part of the 
developing brain. Two of those sections also 
contained the rest of the body and one the rest 
of the head. Detected RNA molecules that 
fell outside of the tissue area, based on the 
reference nuclei images, were removed using 
FISHscale (8). 


Standard Visium Spatial Gene Expression 
library preparation 


Fresh-frozen human embryo samples were cryo- 
sectioned at 10-um thickness. Sections were 
placed onto 10X Genomics Visium arrays and 
stored at —80°C before processing. Spatial gene 
expression libraries were prepared following 
the 10X Genomic Visium Gene Expression 
protocol (https://assets.ctfassets.net/an68im79- 
xiti/2q34xwfHy2nbeFlH47Bl0q/ffa48b53627a582- 
c6b7f2c9fd90af91e/CGO00239_Visium_Spatial_ 
Gene_Expression_User_Guide_Rev_F.pdf). 
Finished libraries were sequenced on Illumina 
Nextseq 2000 according to the manufacturer’s 
instructions (read 1, 28 bp; read 2, 90 bp). 
Visium libraries were processed using Space 
Ranger software from 10X Genomics (version 
1.3.1), with the following command: 
spaceranger count-id=${SAMPLE} \ 
-fastqs=/home/zaneta.andrusivova/ 
projects/hdca_embryo/XDD410/raw/ \ 
-transcriptome=/fastdisk/10x/refdata-gex- 
GRCh38-2020-A \ 
-sample=${SAMPLE} \ 
image=/home/zaneta.andrusivova/ 
projects/hdca_embryo/XDD410/aligned_imgs/ 
220215 HDCA 
_embryo_XDD410_${SLIDE}_ ${AREA} jpg \ 
-slide=$SLIDE \ 
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-area=${AREA} \ 

-loupe- alignment=/home/zaneta.andrusivova/ 
projects/hdca_embryo/XDD410/aligned_imgs/ 
${SLIDE)- 

${AREA}.json 

Reads were aligned to the prebuilt human 
reference genome (GRCh38). Samples were se- 
quenced to a saturation between 63 and 88%. 


Computational analysis 
Data preprocessing 


Sequencing runs were demultiplexed with cell- 
ranger mkfastq version 4.0.0 (10x Genomics) 
and filtered through the index-hopping-filter 
tool version 1.1.0 (10x Genomics). Unique mol- 
ecular identifier (UMI) counts were deter- 
mined using STARSolo version 2.7.10a (RRID: 
SCR_021542) with the following parameters: 

-soloFeatures Gene Velocyto 

-soloBarcodeReadLength 0 

-soloType CB_UMI_Simple 

-soloCellFilter EmptyDrops_CR %s 0.99 10 
45000 90000 500 0.01 20000 0.01 10000 

-soloCBmatchWLtype IMM_multi_ 

Nbase_pseudocounts 

-soloUMIfiltering MultiGeneUMI_CR 

-soloUMIdedup 1IMM_CR 

-clipAdapterType CellRanger4 

-outFilterScoreMin 30 

Barcode whitelists were downloaded from 
the 10x Genomics website. Exonic, intronic, and 
ambiguous counts were summed for clustering 
analysis. 

The reference genome and transcript anno- 
tations were based on the human GRCh38.p13 
gencode V35 primary sequence assembly. How- 
ever, we filtered the reference. In this study, 
only reads that were uniquely aligned to one 
gene were counted. Thus, all the reads that 
aligned to more than one gene were lost, and 
the related genes had lower or zero counts. 

To lower the rate of this read loss, we used 
the human GRCh38.p13 gencode V35 primary 
sequence assembly from which we discarded 
genes or transcripts that overlapped or mapped 
to other genes or non-coding RNAs 3’ UTR, 
leaving only one of these transcripts in the 
genomic reference. We used BLAST (RRID: 
SCR_001653) to align the last 400 nt (3’ UTR) 
of all protein coding transcripts and noncod- 
ing transcripts to all other genes (maximum 
four mismatches, minimum alignment length 
300 nt). We resolved all the matches by the 
following procedure: 

1. Fusion genes were filtered based on their 
names: Genes with names that contained both 
fusion genes [‘genel-gene2”] were discarded. 

2. Noncoding transcripts that matched another 
coding transcript were discarded. 

3. Overlapping transcripts (both examined 
transcripts were either coding or noncoding) 

a. If the name of one of the examined genes 
matched the pattern “XX#####+.#’ its transcript 
was discarded. 
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b. If one of the examined transcripts belonged 
to a gene where one or more of its transcripts 
already got discarded during the procedure, 
it was discarded as well. 

c. We discarded transcripts of the gene with 
a lower number of splice variants. 

d. Otherwise, if the 3’ UTR of one transcript 
aligned with the CDS of another, we selected 
the first transcript; otherwise, we randomly dis- 
carded one of them. 

4. For paralogs, we mapped all related genes 
(that aligned to one another) and selected one 
of the paralogs with the highest number of splice 
variants. All other highly similar paralogs were 
discarded. In special cases, we manually chose 
the gene. 

Altogether, this yielded a new assembly in 
which we filtered 387 fusion genes, 1140 overlap- 
ping transcripts, 414 noncoding transcripts, 1127 
coding paralogs, and 350 noncoding paralogs. 


Quality control and preprocessing 


Initial quality control of each 10X Chromium 
sample as a whole was performed by manually 
examining the result of library preparation 
(deciding whether to sequence the sample at all) 
and the output of the primary analysis pipeline 
(deciding if the sample as a whole was of suf- 
ficient quality to be included at all). Samples 
that failed at these early stages are listed in 
table S1, but no cells from them are included 
in the main dataset. There were several modes 
of failure, including very low yield of RNA, 10X 
controller failure, low fraction of reads in cells 
(broken emulsion), and poor sequencing read 
quality (Q30 fraction). We did not use strict 
criteria, but to save money, we tried to avoid 
deep sequencing of poor-quality samples. There- 
fore, in many cases, the excluded samples were 
only subjected to initial shallow sequencing. 
The result of a Chromium experiment is a 
raw expression matrix over genes and droplets. 
For a high-quality sample, most droplets will 
have contained a single viable cell. However, 
some droplets may have contained a doublet or 
a piece of debris. In order to distinguish these 
possibilities and identify high-quality single- 
cell droplets, we developed the following al- 
gorithm. First, we used the DoubletFinder (24) 
algorithm to mark putative doublets. Next, we 
used the relationship between total UMI count 
and the fraction of unspliced UMIs (fig. S2, A 
to D) to classify droplets as cells, large cells 
(potentially multiplets), cytoplasmic debris 
(low total UMI, low unspliced fraction), cel- 
lular debris (low total UMI, normal unspliced 
fraction), bare nuclei (low total UMI, high un- 
spliced fraction), and mitochondrial debris 
(high fraction mitochondrial reads). To classify 
droplets, we first selected droplets likely to be 
single cells by including only droplets with an 
unspliced fraction greater than 0.1, total UMIs 
greater than 1500, and log(total UMIs) greater 
than log(200) + log(1000) * unspliced fraction. 
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Next, we fit a two-dimensional Gaussian maxi- 
mum likelihood estimate to the selected drop- 
lets (using the logarithm of the total UMIs) 
and then calculated the probability density 
function for all droplets, retaining only those 
with a probability greater than 0.1 (except those 
flagged as doublets). The remaining droplets 
were then classified as illustrated in fig. S2B 
based on their location relative to the good cells. 


Cytograph and cytograph-shoji pipeline 

We used the latest version of our in-house cyto- 
graph pipeline, which implements standard 
analysis steps in a modular and scalable fashion. 
The version used here is called cytograph-shoji 
and uses a custom tensor database called shoji. 
However, the analysis steps are standard and 
can also be performed using publicly available 
single-cell analysis tools such as Scanpy and 
Seurat. 

For the initial clustering, we pooled all cells 
that had passed quality control. We selected 
1000 most variable genes using support-vector 
regression on the CV-versus-mean relationship, 
excluding immediate-early genes, cell cycle 
genes, mitochondrial genes, and noncoding 
RNA (the exact lists of genes excluded are 
available in the cytograph-shoji code, file 
“species/human.py’”). We performed principal 
components analysis and retained up to 50 
components, but only as many as would ex- 
plain 50% of the variance. We then trans- 
formed the PCA using Harmony (25) to correct 
for batch effects due to the 10X Chromium 
chemistry version (v2 or v3). We computed the 
manifold as a balanced k-nearest neighbors 
graph with & = 25 and Euclidean distance 
metric. We performed initial clustering on the 
manifold using the Leiden (26) algorithm, then 
removed clusters with fewer than 25 cells. We 
trained a support-vector classifier (using sto- 
chastic gradient descent with hinge loss) on 
the remaining cluster labels and then applied 
the classifier to the orphan cells to assign their 
cluster identity to one of the retained (not too 
small) clusters. We also computed the classi- 
fier probability for each cell, as well as the 
second-best cluster label, which could be used 
to judge the quality of clusters and transition 
zones between adjacent clusters. 

We then computed a dendrogram of the 
initial clusters and cut this to yield 40 subsets. 
Each subset was reclustered using the same 
procedure as above, and then all the resulting 
clusters were pooled again to make the com- 
plete dataset corresponding to table S2. A few 
clusters were manually identified as doublets 
based on inconsistent marker expression, high 
average doublet score and/or high average UMI 
count. These clusters remain in table S2 but 
are marked and were not used for subsequent 
analyses. 

We validated the anatomical origin of each 
cell by training and applying a classifier, in- 
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ferring the provenance of cells in all samples 
from their expression profiles. The results largely 
agreed with expectations (fig. S2H). 

We computed gene enrichment, trinarization 
scores, autoannotation, and cell cycle scores as 
previously described (5). We used the expression 
of a set of well-known cell cycle genes as a proxy 
for active proliferation (the exact lists of genes 
are available in the cytograph-shoji code, file 
“species/human.py”). We calculated the cell 
cycle score as the fraction of UMIs those genes 
represented and used a threshold of 0.4% to 
call a cell cycling (fig. $3, A to C). Similarly, a 
score for each phase of the cell cycle was cal- 
culated for each cell using a subset of known 
genes expressed in that phase. For each phase 
we used these thresholds: S, 0.2%, G2M, 3%, G1, 
0.3%, else Post-M. We assigned G2M > S > G1 > 
Post-M in case of multiple assignments. 

Gene enrichment is a measure of over- 
expression in a cluster relative to other clus- 
ters, taking into account both mean expression 
and fraction of nonzero cells. 

Trinarization is a measure of the probability 
P that a gene is expressed in more than 20% 
of the cells, resulting in calls of “not expressed” 
(P < 0.05), “ambiguous” (0.05 < P < 0.95), or 
“expressed” (P > 0.95). To determine whether 
a gene, or a set of genes, was expressed in a 
cluster, we used the product of the trinariza- 
tion score (i.e., the joint probability) with a 
cutoff of 0.95. Generally, when we say that a 
gene is expressed in a cluster, this is the formal 
method we used to support that statement. 

Auto-annotations are well-known sets of gene 
markers applied automatically during the analy- 
sis pipeline, using the same trinarization method. 

Cell classes were defined using autoannota- 
tion with the markers described in the main 
text. When a cluster occasionally received mul- 
tiple class labels, they were resolved by priority: 
fibroblast > immune > OPC > glioblast > radial 
glia > neuroblast > neuron > (other). When a 
cluster received no class label, one was assigned 
manually where possible. Neuronal IPCs were 
defined as proliferating neurons and neuroblasts. 

The excitatory lineage analysis in Fig. 3 was 
performed before the cytograph-shoji pipeline 
was completed and therefore used a previous 
version of cytograph (5). In brief, we used 10X 
Genomics cellranger pipeline (version 3.0.2) to 
align the reads from all telencephalic samples. 
RNA counts were attributed to spliced and 
unspliced transcripts by running velocyto (27) 
(version 0.17.11) with standard parameters. For 
initial classification, we used PCA and trans- 
formed it using Harmony to correct for batch 
effect resulting from 10X Chromium chemis- 
try versions. A radius nearest neighbor graph 
(RNN) was then computed using the informa- 
tion radius [also known as the Jensen-Shannon 
divergence (JSD)] to link cells with near-identical 
gene-expression states. This RNN graph was 
clustered using Louvain algorithm. For cortical 
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excitatory lineage analysis, we selected clusters 
that expressed EMXT7. For ganglionic eminences 
interneurons analysis, we selected clusters that 
expressed DLX2 and FOXGI together with 
clusters that expressed FOXGI and not EMX1. 
We excluded OPCs (clusters that express both 
PDGFRA and OLIGI) for this analysis. 


Anatomical annotation of EEL FISH data 


The neural tube of a 5-pcw human embryo was 
annotated in three sagittal sections, in line with 
the prosomeric model of brain development. 
DAPI images and EEL data were opened in 
Napari (77), and nonoverlapping polygons were 
drawn to define neuroanatomical regions. De- 
cisions about defining region borders were 
guided by EEL genes depicted in Fig. 2A, as 
previously used in mouse E10.5 (5), the Allen 
Developing Mouse Brain E115 in situ hybridiza- 
tion data and anatomic reference atlas, and an 
anatomical reference atlas of the early human 
embryo (28). 

EEL genes were used to annotate rhombo- 
meres rl and r2, whereas the rest of the 
rhombomeres were annotated using clear ana- 
tomical hallmarks. Neither r7-rl11, nor the three 
diencephalic prosomeres p1-p3 could be distin- 
guished individually in this embryo. 

A region in the hindbrain was identified 
that was not part of the Allen atlas. Because 
it contained cells expressing NHLHI and 
NRXNS3 like the telencephalic subventricular 
zones, we called it the “hindbrain subventri- 
cular zone.” No clear border between r1 and 
r2 could be identified in this embryo, and we 
called the area of transition between them 
“r1|r2” (Fig. 2, Aand B). A similar color palette 
to the Allen anatomic reference atlas was used 
for visualization of the regions. 


Spatial mapping of single-cell data 


We used the observed spatial expression of 
transcription factors and morphogens with 
known anatomical expression patterns (Fig. 2A 
and fig. S4B) to manually curate an anatomical 
atlas of the specimen following the prosomeric 
model (Fig. 2B and fig. S4A). We further used 
the expression of maturation markers (SOX2, 
radial glia; NHLH1, early neuroblasts; NRXN3, 
late GABAergic neuroblasts; STMN2, neurons) 
to designate the ventricular, subventricular, and 
mantle zones along the extent of the neural tube 
(Fig. 2B and fig. S4, C and E). To study the 
patterning of the neural tube, we selected the 
subventricular zone of the most medial sec- 
tion. The ventricular zone polygon was sub- 
divided in regular segments, and RNA molecules 
were counted in each subpolygon. Molecule 
counts were normalized by the area of the 
polygon and scaled to the maximum. Mean 
expression profiles of single-cell RNA-seq clus- 
ters of cells from 5-week-old samples were 
manually selected if they were likely located 
in the tissue structures included in the EEL 
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FISH datasets, and the clusters were mapped 
to the hexagonally binned spatial data using 
FISHscale and Bonefight (4, 5). Mappings were 
locally smoothened using a distance weighted 
average, and the clusters with the highest prob- 
ability for each spatial hexagonal bin were called 
to make the cluster location map. 


Cortical excitatory lineage analysis 
Integration 


For this analysis, we subset cells from tel- 
encephalic (telencephalon and forebrain) sam- 
ples that were assign to clusters that expressed 
EMX1. We used Scanpy package for most of 
this analysis unless mentioned otherwise (29). 
Preliminary analysis revealed clusters that 
were driven by the precise age, subregion, or 
specimen ID, and we observed batch effects 
beyond those due to RNA sequencing chem- 
istry differences. In order to overcome these 
effects, we used Scanorama (30) to integrate 
cells from the same postconceptional week 
and scVI as implemented in scvi-tools (version 
0.15.5) (31) to integrate cells from all time points. 
The cortical subregion and the specimen ID 
for each cell were considered as the technical 
covariates to correct for. We selected 1000 highly 
variable genes for the Scanorama analysis and 
700 highly variable genes for the scVI analysis 
using scanpy.pp.highly_variable_genes based 
on “seurat_v3” flavor and performed dimen- 
sionality reduction and batch correction for each 
postconceptional week. 

To keep a consistent analysis, subsets of post- 
conceptional weeks that originated from the 
same embryo and the same region (5,7, and 
14 pew) were integrated with another post- 
conceptional week subset (6, 8, and 13 pcw 
samples, respectively). We then computed near- 
est neighbors, clustering using Louvain algo- 
rithm, and UMAP projection using similarity 
in the scVI/Scanorama embedding. 


Radial glia and glioblast cell reclustering 


To further explore radial glia and glioblast 
cells in the excitatory neuron lineage, we re- 
clustered pallial radial glia and glioblast cells 
(EMX1* clusters) after removing cell cycle 
effect using Harmony (correcting for both 10X 
chemistry differences and cell cycle phase). We 
extended this analysis by adding radial glia 
and glioblast cells from the subpallium and 
telencephalon region (either DLX2* clusters 
or FOXGI' clusters excluding OPCs). 


RNA velocity and DeepCycle analysis 


To compute RNA velocity, we used scvelo pack- 
age (version 0.2.4: (32) and ran stochastic mode 
analysis on each batch (cortical subregion or 
specimen as needed) that had more than 1000 
cells in every postconceptional week. To infer 
high-resolution cell cycle trajectories, DeepCycle 
was used based on scvelo run for subset of 
IPCs cells for each batch. IPCs were defined 
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based on score calculate by scanpy.tl.score. 
genes given to each cell by the expression of 
EOMES, NHLH1, and NEUROD4, thus also 
including early differentiating neuroblasts. We 
could not infer a clear cell cycle trajectory for all 
examined batches. Eventually, we used the tra- 
jectory obtained for specimen XDD:313 fore- 
brain samples from 8.5 pcw and specimen 
XDD:385 14 pew. Cell cycle phases were in- 
ferred for transcriptional phase © range man- 
ually based on the score given for each cell 
cycle phase (fig. S6, H and I). Mitosis was de- 
fined at the transcriptional phase where there 
was a sharp drop in UMI counts. 


Differential abundance analysis 


Differences in cell abundances associated with 
postconceptional age was tested using the 
Milo framework (13), with the Python imple- 
mentation milopy (https://github.com/emdann/ 
milopy). We tested differential abundance in 
three subsets of cells: (i) all cortical excitatory 
lineage cells (fig. S7A), (ii) 11- to 14-pew cells of 
v3 Chromium chemistry samples (fig. S7B, top), 
and (iii) 6- to 10-pcw cells of v2 Chromium 
chemistry samples (fig. S7B, bottom). We 
constructed a KNN graph using similarity in 
the scVI embedding [k = 150 for A (v2 and v3 
cells), k = 50 for B (v3 test), and k=100 for C 
(v2 test)]. We assigned cells to neighborhoods 
on the KNN graph using the function milopy. 
core.make_nhoods (prop = 0.1) and defined 
early neighborhoods (SpatialFDR < 0.05, logFC 
< -—0.5) and late neighborhoods (SpatialFDR 
< 0.05, logFC > 0.5). Differential gene ex- 
pression was performed between early and 
late cells from each major class (See section 
Differential gene expression analyses). 


Ganglionic eminences analysis 
Cell selection 


We first extracted 12-pcw telencephalic clus- 
ters that either expressed DLX2 or were not 
excitatory (FOXGI", EMXT ), excluding OPCs, 
and found trajectories that branched into 
lineages clearly defined by ganglionic eminence 
marker genes and neuronal genes (figs. S8A 
and S9, A to G). In order to examine the po- 
tential migration of telencephalic interneurons 
to nontelencephalic regions, we then broadened 
the scope to clusters that expressed DLX2 from 
the whole brain (fig. S8, B to D). Many of these 
cells did not express DLX2, suggesting that they 
were similar to telencephalic interneurons but 
probably did not originate in the telencephalon. 
The vast majority of the DLX2-expressing cells 
were dissected from the diencephalon, thala- 
mus, and hypothalamus. 


Ganglionic eminences signature 


Scores for GE signatures were generated by 
Scanpy score_genes function on canonical markers 
(MGE: NKX2-1, SOX6, LHX6, NXPHI1, SST: 
LGE: SIX3, FOXP1, EBF1; CGE: PROX1, NR2F1, 
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NFIX, NR2F2). We used a threshold of 0.05 to 
determine the cell GE. In case of several 
assignments for a cell, it was resolved by 
prioritizing LGE > CGE > MGE. 


Dissected region classifier 


A random forest classifier from sklearn 
(RandomForestClassifier) was used to predict 
the “contamination” of cells belonging to re- 
gions other than the dissected ones. The classi- 
fier was trained on all brain regions except for 
cells labeled “brain” or “head” with each region 
down-sampled to 70,000 cells per region, re- 
sulting in a training set of 558,702 cells and 
1000 (most variable) genes. The whole dataset 
(1,665,937 cells) was tested on the trained model 
and by treating the dissected region labels as the 
true labels; an accuracy score could be estimated 
to 0.8. Assuming the model performed well 
enough, we used it to predict regions per cell, 
and the fraction of the most commonly pre- 
dicted region was calculated for each individual 
sequencing library. The following hyperpara- 
meters were tuned and used in the final train- 
ing of the classifier: min_samples_leaf = 3, 
max_features = 50, n_jobs = —1. 


Differential gene expression analyses 


All differential gene expression tests were 
performed using the diffxpy package (https:// 
github.com/theislab/diffxpy). The Wilcoxon 
rank-sum test was used unless mentioned 
otherwise. 


Cortical excitatory lineage neuronal-like 
IPCs versus radial glia—like IPCs 


To find genes related to neuronal-like IPCs 
and radial glia-like IPCs, we reclustered pallial 
IPCs after removing cell cycle effect using 
Harmony (correcting for both 10X chemistry 
differences and cell cycle phase) (fig. S6N). Next, 
we selected neuron-like clusters as the clusters 
that have the highest mean expression of 
NEUROD6 (these clusters also exhibit lower 
expression of SOX2) and radial glia-like clus- 
ters as the clusters of cycling cells that have 
high mean expression of SOX2 and low mean 
expression of NEUROD6 (fig. S6O). We further 
tested for differential gene expression between 
the neuron-like clusters and the radial glia-like 
clusters. We normalized raw gene read counts 
by sequencing depth in each cell using scanpy. 
pp.normalize_total function. We considered 
genes as significantly differentially expressed 
if the test FDR < 1 x 10°*° and a log2(fold 
change) > 1.2 or log2(fold change) < —1.2. The 
results for this differential expression analysis 
are provided in table S4. 


Cortical excitatory lineage early versus 
late state 


After testing for differences in cell abundances 
associated with postconceptional age (See sec- 
tion titled Differential abundance analysis), we 
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defined each cell as early, late, or not differen- 
tially abundant (DA) based on its neighborhood 
definition (fig. 7). 

We then performed differential expression 
analysis between early and late cells of the main 
cortical excitatory lineage (excluding cells from 
the minor Cajal Retzsius lineage) (fig. S7A). To 
avoid technical batch effects due to chemistry 
differences, we performed a test using all the 
samples accounting for batch differences. Be- 
cause this test was conservative, we also in- 
cluded two differential expression tests that 
compared early and late states within Chromium 
chemistry version 2 samples only, and within 
11- to 14-pcw Chromium version 3 samples 
only, respectively (fig. S7B). Overall, we used 
three differential expression tests and inte- 
grated the resulting differential expressed genes 
as follows: 

Test A: Early versus late cells in each major 
class type were compared (fig. S7A). Wald test 
was performed, and we accounted for chem- 
istry in the test model. We considered genes as 
significantly differentially expressed if their 
mean expression >0.01, the test FDR< 0.05, 
and log2 fold change >1 or log2 fold change < —1. 

Test B: Early versus late cells from 11- to 14- 
pcw Chromium chemistry v3 samples in each 
major class type were compared (fig. S7B, top). 

Test C: Early versus late cells from Chromium 
chemistry v2 samples in each major class type 
were compared (fig. S7B, bottom). 

For both tests B and C, we normalized raw 
gene read counts by sequencing depth in each 
cell using scanpy.pp.normalize_total function. 
Wilcoxon rank-sum test was used, and the gene 
was considered as significantly differentially ex- 
pressed if its mean expression >0.01, the test 
FDR <1 x 10°”, and a log2 fold change >1.2 or 
log2 fold change <-1.2. 

To integrate the results, we used the fol- 
lowing procedure. For each class: 

-All the differentially expressed genes from 
test A were obtained in the final table. 

-In case the gene was not significantly dif- 
ferentially expressed in test A and was sig- 
nificantly differentially expressed in test B and 
C in aconsistent manner, the gene was assigned 
as either “early” or “late” based on the type of 
cells it was up-regulated in, otherwise it was 
assigned as “mixed.” 

-If the gene was only significantly differen- 
tially expressed in test B (late samples, v3), it 
was assigned as differentially expressed (“late”) 
only if it had higher expression in late cells. 
Otherwise, it was assigned as “mixed.” 

-If the gene was only differentially ex- 
pressed in test B (early samples, v2), it was 
marked as differentially expressed (“early”) 
only if it had higher expression in early cells. 
Otherwise, it was assigned as “mixed.” 

-If the pattern of the change was inconsistent 
between different classes, the gene was assigned 
as “mixed.” HLA genes and gender-related genes 
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were further removed from the final set of 
differentially expressed genes. A summary of 
the results of these differential expression 
analyses is provided in table S5. 

GO term and pathway enrichment was per- 
formed using the implementation of the EnrichR 
workflow in the gseapy Python package (https:// 
pypl.org/project/gseapy/). We used the following 
gene sets: GO_Molecular_Function_2018, GO_ 
Cellular_Component_2018, GO_Biological_ 
Process_2018, KEGG_2016, and KEGG_2021_ 
Human,WikiPathway_2021_ Human. The results 
of this analysis are provided in table S6. 


OPCs 


The test was applied on the raw gene expres- 
sion matrix (without any prior log-normalization) 
where each cell was scaled to the median total 
molecules. For the mouse and human compar- 
ison, gene expression values were specifically 
scaled to 4000 molecules per cell (in between 
the median total molecules per species, human: 
6065 UMIs versus mouse: 3824 UMIs). In the 
species comparison, all cells from the OPCs and 
COPs were compared within each species. 
Differentially expressed genes were selected 
by a cut-off of following parameters: FDR <10~° 
(mouse), FDR <10°“ (human), mean expres- 
sion >0.01, and a log2 fold change =1.5 (ab- 
solute value). Different cutoffs were used per 
species to compensate for differences in num- 
ber of cells sampled and the sensitivity of the 
single-cell chemistry used and were adjusted 
to select a roughly equal number of genes. For 
the regional comparison of the OPCs and 
COPs, all cells from each dissected region were 
tested against the rest of the cells from the 
other regions (excluding the cycling OPCs). 
Differentially expressed genes were selected 
based on these thresholds: FDR <10~*’, mean 
expression > 0.01, and a log2 fold change >1.2 
(selecting only up-regulated genes). 


Radial glia, neuron, and OPC comparison 


Examination of differentially expressed genes 
was done in radial glia and neurons in order to 
reveal whether their regional specificity was 
shared with the patterns seen in OPCs. Radial 
glia and neurons were down-sampled mainly 
from clusters where the most common region 
was either forebrain, midbrain, or hindbrain 
(with a few clusters manually excluded if they 
labeled more than one region on the tSNE em- 
bedding). Next, cells were further selected only 
if they originated from these regions, because 
some selected clusters contained cells from mul- 
tiple regions from the first down-sampling step. 
After down-sampling and regional selection, 
this resulted in a total number of cells for each 
class: 6995 radial glia and 8521 neurons, rep- 
resenting nearly equal numbers per region. 
For each class (radial glia and neurons), cells 
were normalized and scaled to the median 
total UMIs. Then cells from each dissected 
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region were tested against the rest of the cells 
from other regions. Genelists were retrieved by 
using the same cutoff as for the OPCs, resulting 
in the following number of differentially ex- 
pressed genes in forebrain, midbrain, and hind- 
brain: 288, 193, and 498 radial glia and 483, 226, 
and 228 neurons. The Venn diagrams sum- 
marize the overlap of genes between all classes: 
radial glia, neurons, and OPCs. Genes shared 
between all three classes (radial glia, OPCs, and 
neurons) are highlighted in table S7. 


Midbrain analysis 


Cells with midbrain regional annotation were 
processed with Cytograph-shoji default param- 
eters. From this, neural clusters were isolated. 
For this analysis, we also considered the differ- 
ences between the sum of the trinarization 
score and the product of the trinarization score 
to determine whether genes that are analogous 
for cell-type identification were expressed in 
a cluster, in addition to the standard auto- 
annotation procedure. 


Ventral midbrain 


We isolated cells that were autoannotated 
either FOXA2*/FOXA1* or/ and TH’, and their 
nearest neighbors on the RNN graph for fur- 
ther processing with Cytograph-shoji with fol- 
lowing adjustments: (i) 2000 most variable 
genes and excluding immediate-early genes, 
cell cycle genes, mitochondrial genes, non- 
coding RNA and gender-specific genes and 
(i1) PolishedLeiden with minimum size of 10. 
Other configurations were left as default. 


Ventral midbrain — RNA velocity and 
lineages analysis 


To recover RNA velocity, we used scvelo 
package (version 0.2.4) with dynamical mode 
(32). The following configurations were used: 
(i) 2000 most variable genes were selected 
with “svr” flavor, excluding immediate-early 
genes, mitochondrial genes, and sex-specific 
genes, and had minimum 20 shared counts 
for both spliced and unspliced transcripts, (ii) 
the KNN graph was built on integrated PCA 
(output from Cytograph-shoji pipeline) with 
25 neighbors, and (iii) moments were computed 
with 25 neighbors. Other configurations were 
left as default. Latent time was calculated with 
default parameters. 

For ENI and PITX2 lineages analysis, ran- 
dom walks were performed starting from cells 
in a neuronal progenitor cluster, labeled in fig. 
SI1P, on a transition matrix computed on RNA 
velocity vectors and KNN graph with equal 
weighted means with 15 nearest neighbors and 
100 steps. Cells from walks terminated in se- 
lected clusters (Fig. 4, I and J) were isolated and 
binned along the latent time. Pearson correla- 
tion between highly variable genes and latent 
time was calculated to associate genes with 
respective lineages, excluding immediate-early 
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genes, cell cycle genes, mitochondrial genes, 
and sex-specific genes, with CellRank (33) 
(version 1.5.1) implementation of correlation 
test (cellrank._utils._correlation_test). Scanpy 
(29) (version 1.8.2)’s implementation Wilcoxon 
rank-sum test was used to identify differen- 
tially expressed genes along the latent-time bins. 


Ventral midbrain 


To compare the cell types defined in (34), a 
logistic regression (35) was trained with highly 
variables genes, excluding immediate-early 
genes, cell cycle genes, mitochondrial genes, 
ribosomal genes, and gender-specific genes. 
Eighty percent of the normalized and scaled 
data was used for optimizing the regulariza- 
tion strength (C) using Optuna (36), a Bayesian 
hyperparameter optimization, with the default 
sampler over log uniform distribution (between 
the range of 0.001 and 2) for 50 trials. Over 
the trials, C was optimized by evaluating the 
F1 score of 5-folds train-test split per trial. 
Mean accuracy per trial was also logged. The 
optimized C was validated by predicting the 
20% of the data at the end of the optimiza- 
tion. Final model with the full dataset was 
trained in the same approach (C = 0.37) and used 
for comparing current ventral midbrain cells. 


Dopaminergic cells 


Clusters predicted to be dopaminergic cells 
were isolated and processed with the same 
configurations as with the ventral midbrain 
subset. 


Morphometric volume normalization 


When estimating the proportion of classes of 
cells (e.g., as in Fig. 1B), normalization was 
necessary. We did not sample regions and time 
points uniformly, and therefore a raw estimate 
would be skewed by over- and undersampling. 
In addition, the brain as a whole grows in 
volume, and different regions grow at different 
rates. In particular, the telencephalon expands 
substantially during the period we sampled. 
We therefore normalized our sampling to the 
expected tissue volume (excluding ventricles) 
based on morphometric estimates of forebrain, 
midbrain, and hindbrain size by age (37, 38). 
We omitted those time points where we lacked 
complete coverage (e.g., when hindbrain was 
sampled only as pons, omitting medulla); this 
is why 13 and 14 pew are missing from Fig. 1B. 


Illustrations 


The embryo image in Fig. 1A is a rendering of 
an actual human embryo, overlaid on a three- 
dimensional volume representing the nervous 
system, redrawn from data provided by HDBR 
(https://hdbratlas.org). 
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INTRODUCTION: Cortical development in humans 
is a meticulously regulated process that unfolds 
over months of prenatal and years of postnatal 
life. This process encompasses the genesis, dif- 
ferentiation, and maturation of various cellular 
lineages that are critical to the complex struc- 
ture and function of the brain and are affected 
in neurodevelopmental conditions. An area of 
active investigation is understanding the mo- 
lecular mechanisms that define the develop- 
mental trajectory of specific cortical lineages. 
Recently, research in this field has extensively 
leveraged single-cell genomics, but its application 
has been largely confined to studying the second 
trimester of human cortical development. 
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RATIONALE: Recent advances in technology, such 
as single-nucleus RNA sequencing (snRNA-seq), 
create new possibilities to explore previously 
uncharted territories of cortical development. 
These innovative tools can capture the molec- 
ular progression of cortical lineages across the 
life span, creating a more comprehensive and 
precise transcriptomic atlas of human cortical 
development. Furthermore, the integration of 
snRNA-seq and single-nucleus chromatin acces- 
sibility data allows for the identification of gene 
regulatory networks and transcription fac- 
tors that are instrumental in shaping specific 
cortical lineages. This comprehensive approach 
could potentially unveil sex- and region-specific 
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Single-cell genomics analysis of human cortical development across prenatal and postnatal life. 

(A) Developmental stages captured in this study. (B) Identification of cell types and lineages across 
developmental stages and molecular modalities. UMAP, uniform manifold approximation and projection; OPCs, 
oligodendrocyte precursor cells; Oligos, oligodendrocytes; ATAC, assay for transposase-accessible chromatin. 
(C) Discovery of lineage-specific developmental genes and enhancer gene regulatory networks. L2-3, 
upper-layer intratelencephalic projection neurons; PV-BSK, parvalbumin basket interneurons. 
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features and presents an unprecedented or oer 
tunity to shed light on lineage-specific susl— 
tibility to neurodevelopmental and psychiatric 
conditions, such as autism spectrum disor- 
der (ASD). 


RESULTS: We generated snRNA-seq data from 
human cortical samples during prenatal and 
postnatal stages of development and inte- 
grated these data with previously published 
datasets. Our study involved the analysis of 
>700,000 snRNA-seq profiles sourced from 
169 tissue samples and 106 donors. Using single- 
cell trajectory analysis, we identified develop- 
mental programs linked to the genesis of specific 
cortical cell types, including subtypes of ex- 
citatory neurons, interneurons, glial cells, and 
brain vasculature. We also determined sex- and 
region-specific developmental transcriptomic 
programs used by specific cortical lineages. By 
intersecting lineage-specific transcriptomic * 
profiles with single-nucleus chromatin acces- 
sibility data, we defined enhancer gene regula- 
tory networks and transcription factors pivotal 
to the commitment of defined cortical lineages. 
Using our insight into lineage-specific molec- 
ular developmental programs, we identified . 
cell lineages and developmental stages impli- 
cated in the risk of neurodevelopmental dis- 
orders and observed that lineage-specific gene 
expression programs up-regulated in female 
cells are particularly enriched for the genetic 
risk factors of autism. This finding suggests 
that males may have an increased susceptibil- 
ity to haploinsufficiency in ASD, which could 
provide a plausible explanation for the observed 
increased incidence of autism in males. 


CONCLUSION: Our study illuminates the molec- 
ular changes underlying the development human 
cortical lineages. By integrating single-nucleus 
RNA expression and chromatin accessibility pro- , 
filing, we charted a comprehensive transcrip- 
tomic atlas of cortical lineages across prenatal 
and postnatal development, identified key tran- 
scriptional networks, highlighted sex-specific 
developmental changes, and defined cell types 
and developmental stages most enriched for ge- 
netic risk factors of neurodevelopmental dis- 
eases. Our results shed light on lineage-specific 
mechanisms of normal cortical development, 
the genetic vulnerabilities to developmental 
brain disorders, and the role of sexually di- 
morphic gene expression in the pathogenesis 
of autism. 
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We analyzed >700,000 single-nucleus RNA sequencing profiles from 106 donors during prenatal and 
postnatal developmental stages and identified lineage-specific programs that underlie the development 
of specific subtypes of excitatory cortical neurons, interneurons, glial cell types, and brain vasculature. 
By leveraging single-nucleus chromatin accessibility data, we delineated enhancer gene regulatory 
networks and transcription factors that control commitment of specific cortical lineages. By intersecting 
our results with genetic risk factors for human brain diseases, we identified the cortical cell types 

and lineages most vulnerable to genetic insults of different brain disorders, especially autism. We find 
that lineage-specific gene expression programs up-regulated in female cells are especially enriched 

for the genetic risk factors of autism. Our study captures the molecular progression of cortical lineages 


across human development. 


evelopment of the human cerebral cortex 

spans months during prenatal stages and 

years after birth, generating tens to hun- 

dreds of cell types across multiple cortical 

areas. This complex process is orches- 
trated by lineage-specific gene expression pro- 
grams that guide the production, migration, 
differentiation, and maturation of neuronal 
and glial cell types, as well as the formation of 
projections and neuronal circuits. Alterations in 
these regulatory gene programs during develop- 
ment lead to the pathogenesis of neurodevel- 
opmental and psychiatric disorders, including 
autism spectrum disorder (ASD) and schizo- 
phrenia (SCZ). Most previous studies have 
focused on investigating the molecular pro- 
cesses that underly human cortical develop- 
ment during the second trimester of gestation 
(1-5), which is the peak of cortical neurogenesis 
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and neuronal migration. These studies have 
revealed molecular signatures of progenitor 
cells and neuronal and glial cell types, as well 
as the early specification of neurons into broad 
subtypes and their arealization across the 
cortex. However, later stages of human cortical 
development—including the third trimester of 
gestation, birth, and neonatal and early post- 
natal development—have been largely studied 
using bulk genomic approaches. 


Single-nucleus RNA sequencing analysis 
of prenatal and postnatal human 
cortical development 


To gain a comprehensive view of human cor- 
tical development across prenatal and postnatal 
stages, we used single-nucleus RNA sequenc- 
ing (snRNA-seq) (6) to profile 413,682 nuclei 
from 108 tissue samples derived from 60 neuro- 
typical individuals. We sampled nuclei from 
ages spanning from the second trimester of 
gestation to adulthood, including samples from 
the third trimester and early postnatal stages 
that are often excluded or underrepresented 
in genomic studies of the human brain. We ac- 
quired data from the ganglionic eminences— 
the major source of cortical interneurons (7, 8)— 
and from the cortex. We used Seurat (9) to 
perform unbiased clustering and uniform mani- 
fold approximation and projection (UMAP) 
embedding. After removing a cluster of cell de- 
bris (fig. S1A), we retained 358,663 nuclei. To 
extend our analyses to more brain samples and 
nuclei, we integrated our data with published 
datasets of prenatal and postnatal human 
cortical development (JO-12). After data inte- 
gration (fig. S1B), our final dataset included 
709,372 nuclei and 169 brain tissue samples 


13 October 2023 


from 106 individuals (Fig. LA and data S1). 
We identified clusters corresponding to neu- 
ral progenitors and to the major subtypes of 
excitatory and inhibitory neurons, glia, and 
vascular cells (Fig. 1, B and C), indicating that 
we were able to capture transcriptomic changes 
underlying differentiation and maturation 
of cortical cell types across development. We 
detected similar numbers of genes, transcripts, 
and mitochondrial RNA ratios across different 
samples (fig. S1C), with a median of 1106 genes 
and 1609 transcripts per nucleus and some 
variability from sample to sample. These rela- 
tive numbers are comparable with published 
single-cell genomics data collected from the 
human brain (13), with mature neuron cell 
types expressing higher numbers of genes and 
transcripts than other cell types (fig. SID). We 
observed neither batch effects (nuclei from 
different samples were well intermixed) nor 
clusters composed of nuclei from a single sam- 
ple (fig. SIE). Nuclei were captured from the 
prefrontal, cingulate, temporal, insular, and 
motor cortices (fig. SIF). For prenatal samples 
that were not sex-identified, we determined 
their sex using sex-specific gene expression 
(fig. SIG). Our dataset included 45 female and 
61 male subjects. We observed that nuclei clus- 
tered according to developmental age (Fig. 1D), 
suggesting that transcriptomic changes asso- 
ciated with development are a major driver of 
cell identity. 


Analysis of specific excitatory neuron 
and interneuron lineages 


We next examined the developmental trajec- 
tories of excitatory and inhibitory neurons. First, 
we selected clusters corresponding to dorsal 
forebrain progenitors (including radial glia 
and intermediate precursor cells) as well as 
clusters containing excitatory neurons. By re- 
clustering this data and referencing molecu- 
larly defined cell types annotated in the Allen 
Brain Atlas (4), we identified clusters corre- 
sponding to known subtypes of excitatory 
neurons, including upper (L2-3) and deep- 
layer intratelencephalic (L5-6-IT) projection 
neurons, layer 4 neurons (LA), and layer 5 (L5) 
and layer 6 (L6) extratelencephalic projection 
neurons, as well as subplate neurons (SP) that 
were present transiently during the second 
trimester (Fig. 2A and fig. S2A). We next used 
the analysis toolkit Monocle 3 (/5) along with 
custom scripts (see methods in the supple- 
mentary materials) to construct cellular tra- 
jectories on the basis of snRNA-seq data (Fig. 2A 
and fig. S2B), select trajectory branches corre- 
sponding to specific lineages, and calculate 
pseudotime for each nucleus. Pseudotime cor- 
responded well to the developmental age of 
nuclei in each lineage (Fig. 2A). We identified 
several branching points in the trajectory: be- 
tween two major groups of excitatory neu- 
rons L2-3, L4, and L5-6-IT (Ex1) and L5 and 
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Fig. 1. Brain tissue samples used for data collection and initial clustering 
of snRNA-seq data. (A) Overview of the tissue samples used in this study, 
including the number of individuals and the ages and brain regions captured in 
the snRNA-seq dataset. MGE, medial ganglionic eminence; LGE, lateral ganglionic 


L6 (Ex2) and between L4 and L2-3 or L5-6-IT 
(Ex3). Next, we aimed to investigate develop- 
mental gene expression changes during differ- 
entiation and maturation of y-aminobutyric 
acid-mediated interneuron (IN) lineages. We 
selected nuclei from both ventral forebrain pro- 
genitors and cortical interneurons, re-clustered 
the data, and identified known classes of cortical 
interneurons (Fig. 2B and fig. S2C), including 
interneurons expressing vasoactive intestinal 
polypeptide (VIP), calretinin (CALB2), reelin 
(RELN), and nitric acid synthase (NOS); chan- 
delier (PV-CH) and basket (PV-BSK) interneu- 
rons expressing parvalbumin (PV), membrane 
metalloendopeptidase (VME), and tachykinin 
precursor 1 (ZAC7); and interneurons express- 
ing somatostatin (SST) and co-expressing SST 
and reelin (SST-RELN). We then reconstructed 
lineage trajectories corresponding to each in- 
terneuron subtype (Fig. 2B and fig. S2D), as 
well as point of trajectory divergence, such as 
trajectory branches including medial ganglionic 
eminence (MGE)-derived (IN1) and caudal 
ganglionic eminence (CGE)-derived (IN2) in- 
terneurons. We calculated pseudotime for each 
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nucleus, which correlated well with the devel- 
opmental age of the interneurons. Next, we 
asked whether different neuronal lineages 
in the human cortex mature at different rates. 
We correlated pseudotime with the develop- 
mental age in each neuronal lineage and ob- 
served that neuronal types fell into two main 
groups: those that mostly matured by the end 
of the second trimester, and those whose tran- 
scriptome profiles continued to change through 
the third trimester and after birth (Fig. 2C). The 
first group included L5, L5-6-IT, and all in- 
terneuron subtypes, whereas the second group 
contained L2-3, LA, and L6 excitatory neurons. 
This result suggests that certain types of hu- 
man cortical neurons have a protracted mat- 
uration timeline. 

Once we isolated trajectory branches corre- 
sponding to each neuronal lineage, we sought 
to identify lineage-specific gene expression 
programs. We used an approach that allows 
identification of lineage-specific programs by 
comparing dynamic expression profiles of each 
gene in a lineage of interest to all other neu- 
ronal, glial, and non-neural lineages (see meth- 
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ods). In addition, we applied this approach to 
identify genes specific to related lineages in 
the excitatory neuron and interneuron tra- 
jectory branches. In total, we identified 1062 
lineage-specific genes and 4.05 branch-specific 
genes (data S2). We classified these genes ac- 
cording to the age of onset of gene expression 
(50% of the maximum expression) and per- 
formed Gene Ontology (GO) analysis for the 
genes up-regulated at each developmental time 
point (Fig. 2D). During the second trimester 
of gestation, we saw enrichment in pathways 
related to neurogenesis, differentiation, and 
process growth. Up-regulation of synaptogen- 
esis and ion transport pathways could be ob- 
served during the third trimester but was most 
profound between birth and 1 year of age. 
Enrichment in synaptic pathways could be 
observed until adulthood. 

In addition to classifying genes according to 
their age of appearance, we also characterized 
dynamic expression patterns of lineage-specific 
genes. The two most common patterns we ob- 
served were transient expression and burst ex- 
pression where up-regulation would start at a 
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Fig. 2. Analysis of excitatory and inhibitory neuron lineages. (A) Cell types, 
reconstructed single-cell trajectories, and age distribution for subtypes of 
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certain age and continue into adulthood (Fig. 
2E). Our analysis identified several putative 
regulators of neuronal lineage commitment, 
such as transcriptional regulator NI specific 
to L2-3, L5-6-IT, and LA neurons; noncoding 
RNAs CYPIBI-AS] and LINC00507 enriched in 
L2-3 neurons; and HS3ST4 specific to L5 neu- 
rons. We saw that genes enriched in more 
broad lineage branches tended to be tran- 
siently expressed, whereas genes specific to 
mature neuronal cell types mostly followed 
burst expression patterns (Fig. 2F). This sug- 
gests gradual commitment and specification 
of neuronal cell types through a series of tran- 
sient and burst transcriptional events. We also 
classified additional less common expression 
patterns, such as biphasic expression (fig. S2E), 
and identified different biological processes 
enriched for genes with burst and transient 
expression patterns (fig. S2F). Finally, we iden- 
tified genes dynamically expressed during the 
specification of subplate neurons by compar- 
ing lineages during the second and third tri- 
mester of gestation when these cells are present 
(fig. S2G). Using spatial transcriptomic analy- 
sis of 140 genes across three developmental 
time points, we were able to identify and visu- 
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alize the spatial location of cell-specific clus- 
ters overlaid on the tissue cytoarchitecture. 
Focusing on early emerging lineage-specific 
genes, we validated the spatiotemporal expres- 
sion of excitatory layer-specific markers (Fig. 
2, G and H, and fig. S3). We observed that 
broad classes of excitatory neurons in the Ex], 
Ex2, and Ex3 trajectory branches are restricted 
to specific cortical layers during the second 
trimester of gestation. Moreover, several mark- 
ers of L4 neurons, such as hippocalcin (HPCA) 
and gremlin 2 (GREM2), are expressed in a 
layer-restricted manner during the second tri- 
mester of gestation, suggesting that L4: neuronal 
identity starts to be specified early in develop- 
ment. The layer identity of most excitatory 
neurons emerges by birth (fig. S3) based on the 
lineage-specific signatures that we find specify 
human cortical neurons and their segregation 
to cortical layers. 


Dissection of glial and non-neural lineages 


We further focused on the analysis of glial 
lineages, including astrocytes and oligoden- 
drocytes. We re-clustered glial progenitors, 
oligodendrocyte precursor cells (OPCs), oligo- 
dendrocytes, and astrocytes and performed 
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lineage- and branch-specific genes with transient and burst expression patterns. 
(F) Number of transient and burst genes in specific lineages and branches. 
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trajectory analysis (Fig. 3A). We identified two 
types of astrocytes: fibrous astrocytes with high 
expression of glial fibrillary acidic protein (GFAP) 
and protoplasmic astrocytes with low expres- | 
sion of GFAP and high expression of gluta- , 
mate transporter GLAST (SLAIA3) (fig. S4A). 
Next, we performed identification of lineage- 
specific genes in the manner described for neu- 
ronal lineages (data S2). We first focused on 
genes that were expressed at the divergence 
of astrocyte and oligo trajectory branches 
(Fig. 3B). We observed well-known transcrip- 
tion factors guiding commitment to the oligo 
and astrocyte lineages, including OLIGI, OLIG2, 
ID4, and SOX9, as well as other putative regu- 
lators, such as the zinc finger protein ZCCHC24 
specific to the oligo lineage and a DNA binding 
protein, STOX1, enriched in astrocytes. When 
comparing fibrous and protoplasmic astro- 
cytes, we identified gene programs specific to 
these cell types (Fig. 3C). Genes up-regulated in 
protoplasmic astrocytes after birth and during 
the first year of life were mostly associated with 
the transport of glutamate and its metabolites 
(Fig. 3D), suggesting a maturation program to 
support neuronal firing during the early post- 
natal period. For oligodendrocytes, we observed 
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Fig. 3. Analysis of cortical glial lineages. (A) Clusters and trajectories of glial 
progenitors, astrocytes, and oligodendrocytes. (B) Example genes specific to 
oligodendrocyte and astrocyte lineage branches. (C) Examples of top 
dynamically expressed genes specific to fibrous and protoplasmic astrocytes. 


that genes up-regulated during the second and 
third trimesters were associated with glial cell 
differentiation, whereas myelination genes were 
up-regulated after birth and continued to be 
expressed into adulthood (Fig. 3E). Analysis 
of microglia development (Fig. 3F) identified 
three cell trajectories (MG-1, -2, and -3), one 
of which (MG-3) was associated with highly 
activated microglia and was present in a small 
number of samples. These trajectories were 
confirmed by an alternative analysis using 
Slingshot (fig. S4B) (16). We focused on the non- 
activated microglia trajectories (MG-1 and 
MG-2), which were differentiated from each 
other by expression of a proinflammatory mi- 
croglia marker, IKZFI, expressed in MG-2. IKZFI1 
was the only gene differentiating MG-1 and 
MG-2, suggesting that these trajectories may 
represent two different states of the same mi- 
croglia cell type rather than different subtypes; 
therefore, we focused on genes developmen- 
tally expressed in both of these microglia cell 
clusters. By performing GO analysis of microglia- 
specific genes up-regulated at different de- 
velopmental stages, we observed complement 
genes associated with synaptic pruning up- 
regulated in microglia after birth and during 
the first year of life (Fig. 3G and fig. S4C). 
These findings suggest that the developmen- 
tal period between birth and the first year of 
life is a critical period of synaptic formation 
and plasticity that involves not only neuronal 
lineages but also protoplasmic astrocytes and 
microglia. Finally, we identified gene programs 
associated with the maturation of brain en- 
dothelial cells and pericytes (fig. S4, D to F). 
Our data suggest a coordinated maturation 
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of neuronal and glial cell functions that en- 
sures proper formation and maintenance of 
neuronal circuits. 


Integration with single-cell open chromatin 
data and identification of lineage-specific 
gene regulatory networks 


Epigenetic regulation plays a crucial role in cor- 
tical neuron lineage commitment and specifica- 
tion. To identify lineage-specific transcriptional 
and epigenetic regulators of the cortical lineages 
identified in the snRNA-seq data, we leveraged 
the recently published single-nucleus assay 
for transposase-accessible chromatin using se- 
quencing (snATAC-seq) data from the develop- 
ing human cortex during prenatal and postnatal 
stages (4, 10, II, 17). First, we combined snATAC-seq 
data from four datasets, obtaining 284,907 
snATAC-seq profiles from 57 tissue samples 
and 42 individuals across the second trimester 
and early postnatal stages of development, as 
well as adulthood. We then used Seurat to in- 
tegrate the resulting snATAC-seq data with 
our snRNA-seg data and mapped the integrated 
snATAC-seq data to the snRNA-seq clusters, 
UMAP space, and cell types (Fig. 4A; see meth- 
ods). We observed that the developmental ages 
for the snATAC-seq and snRNA-seq profiles were 
well aligned (Figs. 4A and 1D). Gene activity 
(open chromatin in the promoter and gene 
body) of cell type marker genes suggested that 
snATAC-seq profiles mapped to correspond- 
ing transcriptionally defined neuronal and 
glial cell types (fig. S5A). Next, we repeated the 
integration and mapping procedure for three 
major lineage classes: excitatory neurons, in- 
terneurons, and glia (astrocytes and oligoden- 
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(D) GO analysis of protoplasmic astrocyte-specific genes expressed during 

the first year of life. (E) Pathways enriched for oligo lineage-specific genes 
expressed at different developmental stages. (F) Analysis of microglia lineages. 
(G) Temporal patterns of developmental microglia genes. 


drocytes) (Fig. 4, B to D, and fig. S5, B to D). 
We omitted microglia and vascular cells be- 
cause of a low number of snATAC-seq profiles 
in these lineages. After mapping snATAC-seq 
data to the transcriptionally defined lineages, we 
selected snATAC-seq cells along each lineage 
branch (fig. $5, B to D). Not all lineages could 
be reliably recovered owing to the smaller size 
of the snATAC-seq dataset and the lack of key 
developmental stages, such as the third tri- 
mester. We therefore focused on lineages that 
had ATAC cells along the entire span of the 
trajectory, including four excitatory neuron 
lineages, five interneuron lineages, and both 
types of astrocytes and oligodendrocytes, as 
indicated in Fig. 4, B to D. Plots of lineage- 
specific gene activity over pseudotime demon- 
strated that we accurately mapped and selected 
lineage-specific snATAC-seq profiles (Fig. 4, 
Bto D). Finally, we leveraged SCENIC+ (78), a 
recently developed algorithm that uses paired 
single-cell transcriptomic and open chromatin 
data to identify enhancer gene regulatory net- 
works (eGRNs) and candidate transcription 
factors that regulate expression of target genes 
in these networks. We applied SCENIC+ to the 
snRNA-seq and snATAC-seq profiles in each 
lineage to identify open chromatin regions cor- 
related with pseudotime, putative enhancers, 
candidate transcription factors (TFs) that bind 
them, and their association with lineage-specific 
dynamically expressed genes (data S3). In total, 
we identified 51 transcription factors regulat- 
ing 1373 lineage-specific genes through pre- 
dicted binding of 4846 regulatory chromatin 
regions. We observed networks regulated by 
previously known lineage-specific transcriptional 
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regulators, such as SOX5 in deep-layer projection 
neurons (Fig. 4B), LHX6 in MGE-derived PV 
and SST interneurons (Fig. 4C and data S3), 
OLIG2 in oligodendrocytes, and SOX9 in astro- 
cytes (Fig. 4D). Additionally, we identified 
previously unrecognized (to the best of our 
knowledge) putative lineage-specific tran- 
scriptional regulators, such as BACH2, predicted 
to regulate several key deep-layer transcrip- 
tion factors in L5 neurons, including FOXP2 
and FEZF2, as well as NFIX and ZNF184 spe- 
cific to L2-3 neurons and regulating expres- 
sion of the upper-layer master transcription 
factor CUX2 (Fig. 4B). Our results also suggest 
the role of the transcription factor MAFB in 
parvalbumin interneuron specification (Fig. 4C), 
as well as of FOXN2 and RFX4 in determining 
the fate of oligodendrocytes and protoplasmic 
astrocytes, respectively (Fig. 4D). Our data shed 
new light on epigenetic control of neural lineage 
commitment and identify putative transcrip- 
tion factors and regulatory networks that de- 
fine the fate of specific human cortical neuronal 
and glial cell types. 


Identification of region- and sex-enriched 
lineage-specific gene programs 


Given that we sampled our transcriptomic 
data from different cortical regions, we asked 
whether lineage-specific developmental gene 
expression profiles might be spatially defined 
and vary depending on cortical area. We focused 
on the frontal and prefrontal cortex (PFC) be- 
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cause we had the most complete sampling of 
these cortical areas across developmental stages 
(fig. S6A). We compared each neuronal and 
glial lineage trajectory in the PFC with the 
trajectories in all other cortical areas and 
identified PFC-enriched developmentally reg- 
ulated genes in each lineage (data S4). We 
observed more PFC-specific genes in excitatory 
neuron lineages, especially in intratelencephalic 
upper (L2-3) and deep-layer (L5-6-IT) neurons, 
as well as in astrocytes and oligodendrocytes, 
whereas most interneuron lineages and microg- 
lia expressed fewer PFC-specific genes (fig. S6B). 
After performing GO analysis for PFC genes 
specific to neuronal lineages, we observed en- 
richment in cell adhesion and synaptic transmis- 
sion pathways (fig. S6C). Analysis of glia-specific 
PFC genes demonstrated enrichment in differ- 
ent categories of biological pathways asso- 
ciated with cell division and cell migration 
(fig. S6D). Examples of neuronal PFC genes in- 
cluded synaptojanin 2 binding protein (SYNJ2BP) 
regulating receptor localization and signal trans- 
duction at the synapse and the cation channel 
TRPC7 (fig. S6E). PFC fibrous astrocytes up- 
regulated R-spondin 2 (RSPO2) and frizzled 
class receptor 8 (FZD8), which both partici- 
pate in Wnt signaling and cell migration. Our 
results suggest cortical areal differences in 
lineage-specific transcriptomic programs, with 
synaptic genes up-regulated in neuronal cell 
types and cell division and cell migration pro- 
grams activated in glial cells in the developing 
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Network plots (eGRNs) display transcription factors predicted to bind enhancer 
regions to regulate lineage-specific transcriptional programs. Edge colors indicate 
regulation by different transcription factors. Top 20 genes on the basis of the 
predicted confidence of interaction are shown for each transcription factor network. 


PFC. PFC-specific expression of synaptic genes in 
neuronal cell types suggests regional specifica- 
tion of neuronal circuits during development. 
We next asked whether the development of 
specific cellular lineages is modulated in a sex- 
dependent manner. For each lineage analyzed, 
we selected female and male nuclei (Fig. 5A 
and fig. S7, A and B) and identified dynamically 
expressed genes enriched during either female 
or male development. In total, we identified 740 
female-enriched genes and 312 male-enriched 
genes (data S5). A smaller fraction of male 
genes showed sex enrichment in a lineage- 
specific manner (181/312, 58%) compared with 
female-enriched genes (510/740, 69%). Despite 
several top sex-enriched genes located on X 
and Y chromosomes [including X inactive 
specific transcript (X7ST) and protocadherin 11 
Y-linked (PCDH1IY)], sex-enriched genes were 
evenly distributed across all chromosomes (fig. 
S7C), suggesting that sex-dependent develop- 
mental modulation of gene expression is not 
directly dependent on transcription from the 
sex chromosomes. We next performed GO analy- 
sis of female- and male-enriched genes, focusing 
on the neuronal, astrocyte, and oligodendrocyte 
lineages for which we had a large number of 
samples and nuclei from both sexes. We ob- 
served substantial differences between the bio- 
logical processes associated with female-enriched 
genes and male-enriched genes: female genes 
were involved in developmental processes, in- 
cluding cell adhesion, central nervous system 


5 of 8 


A Female Male B 
Excitatory neurons 


UMAP 2 
UMAP 2 


10 10 


5 0 5 
UMAP 1 
Interneurons 
10) 


UMAP 2 
Oo 


C 


6 5 
UMAP 1 


UMAP 2 
oO 

UMAP 2 
Oo 


5 0 
UMAP 1 


Fig. 5. Analysis of sex-specific developmental programs in human cortex. 
(A) Female and male developmental trajectories of excitatory neurons, interneurons, 
astrocytes, and oligodendrocytes. (B and C) GO analysis of female-enriched and 


(CNS) development, synaptic transmission, and 
membrane potential regulation (Fig. 5B), whereas 
male genes were associated with RNA metab- 
olism and translation (Fig. 5C). Only a small 
number of male-specific genes, such as Y-box 
binding protein 1 (YBX7) and leucine rich re- 
peat and Ig domain containing 1 (LINGOJ), 
were associated with developmental processes; 
however, these genes were enriched across 
multiple male lineages (fig. S7D). We classified 
sex-enriched genes according to their dynamic 
expression pattern and saw that the majority 
were expressed transiently (Fig. 5D), with >90% 
reaching medium expression during the sec- 
ond trimester (data S5). This suggests early and 
transient sex-dependent developmental modu- 
lation of cortical lineages. Sex-enriched genes 
were more abundant in excitatory neuron line- 
ages than in interneurons (Fig. 5E) and were 
also abundant in female fibrous astrocytes. 
Several top lineage-specific female-enriched 
genes were associated with neuronal, glial, and 
endothelial development (Fig. 5F and fig. S7E). 
These included nuclear hormone receptor/ 
transcription factor RORA in L2-3 neurons, 
synaptic protein neurexophilin 3 (VXPH3) in 
L6 neurons, transcription factor HES4 in fi- 
brous astrocytes, and an actin filament depoly- 
merization enzyme, MICAL3, in oligodendrocytes. 
Overall, our results point to modulation of neu- 
ronal and glial developmental programs during 
second trimester female brain development. 


Enrichment of lineage-specific developmental 
gene programs for risk factors of brain disorders 


Once we defined lineage- and sex-specific de- 
velopmental gene programs in human cortical 
cell types, we sought to investigate how these 
transcriptional programs may be affected in 
neurodevelopmental, psychiatric, and neuro- 
degenerative disorders. We compiled all lineage- 
specific gene signatures for excitatory neurons, 


Velmeshev et al., Science 382, eadf0834 (2023) 


Chemical synaptic transmission = 


Modulation of chemical synaptic transmission aaa 
Regulation of membrane potential aaa 
Monoatomic ion transmembrane transport aaa E 
Regulation of kinase activity == 


Regulation of system process —S L4 


Protein targeting to membrane aa 
MRNA catabolic process =a 
Cytoplasmic translation es 


Cellular component morphogenesis =a) 


mRNA metabolic process Ga 
Amide metabolic process == 


Pattern 
Hi Female @ Male 


Female GO 


Cell adhesion gaa 
CNS development maa 


Cell-cell adhesion maa 
Cell-cell signaling === 


Lineage 


0 10 20 
-log10(FDR) sP 


Male GO Le 


Viral transcript 


rRNA processing === 


Viral process = 


Locomotion = ‘ j 
0 40 


20 
-log10(FDR) 0 # genes 


astrocytes, oligodendrocytes, interneurons, mi- 
croglia, endothelial cells, and pericytes, in total 
obtaining 2796 distinct genes, and divided 
them into five groups according to their age 
of expression onset (50% of max expression). 
We then overlapped this gene list with lists of 
rare gene variants associated with the risk of 
ASD from the Simons Foundation Autism Re- 
search Initiative (SFARI) Gene database (19), 
as well as genome-wide association study genes 
for the risk of SCZ (20), bipolar disorder (BPD) 
(21), and Alzheimer’s disease (AD) (22) (Fig. 6A 
and data S6). We observed a large enrichment 
for genes associated with risk for ASD, SCZ, 
and BPD in the second trimester, with expres- 
sion of ASD and BPD risk genes extending to 
the third trimester. The risk of neurodevelop- 
mental disorders dropped during later stages 
of development. Expression of AD risk genes 
remained mostly flat and only slightly above 
the significance level, demonstrating a pattern 
different from neurodevelopmental and psy- 
chiatric disorders. We next analyzed enrichment 
of disease risk genes across cortical lineages (Fig. 
6B). We were able to detect significant enrich- 
ment for ASD risk genes in L5-6-IT and L5 
neurons, whereas AD risk genes were enriched 
in microglia. We focused on ASD because we 
observed the strongest enrichment for the risk 
of this disorder among developmentally regu- 
lated genes and because a large amount of ge- 
netic risk data are available for this disorder. 
We observed developmental enrichment of ASD 
risk genes with a SFARI score of 2 or 3 but not 
a score of 1 and did not find enrichment in 
syndromic ASD genes (Fig. 6C). We observed a 
significant enrichment among high-confidence 
ASD risk genes (ASD-HC) based on the TADA 
(transmitted and de novo association) analysis 
(23). We conclude that the genetic burden of 
ASD has the potential to affect the development 
of specific neuronal cell types, especially deep- 
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male-enriched genes. (D) Dynamic expression patterns of sex-enriched genes. 
(E) Sex enrichment of developmental gene expression across neuronal and glial 
lineages. (F) Examples of top female-enriched genes in specific lineages. 


layer intratelencephalic projection neurons and 
L5 neurons. We next explored enrichment of 
ASD risk genes in sex-specific developmental 
programs. We observed strong enrichment of 
female-specific developmental genes in both 
SFARI and ASD-HC gene lists (Fig. 6D). Male- 
specific genes were less frequently found among 
SFARI genes, and we did not find a meaning- 
ful overlap between male-enriched and ASD-HC 
genes. This finding points to a strong enrichment 
of the genetic risk of ASD among developmental 
genes that are more highly expressed in female 
cells. SEARI genes were enriched in female cells 
across multiple neuronal cell types, especially 
the subplate and L6 excitatory neurons, as well 
as oligodendrocytes and fibrous astrocytes, but 
not in microglia or vascular cell types (Fig. GE). 
This suggests a role for the subplate in the patho- 
genesis of ASD. Examples of female-specific 
ASD-HC genes included the subplate-specific 
transcription factor NR4A2 and the neuronal 
transcription factor MEF2C that were up- 
regulated in female subplate cells, as well as a 
regulator of axon guidance and synaptogenesis, 
neurexin 2 (VRXN2), and PCDHI5 encoding a 
cell adhesion molecule in female L6 neurons 
(Fig. 6F). Our findings provide strong evidence 
supporting the ASD female protective effect 
hypothesis (24) and suggest that fine-tuning of 
cortical cell lineages by sex-specific develop- 
mental programs can contribute to the male 
bias in the pathogenesis of ASD. 


Discussion 


By generating snRNA-seq data from the de- 
veloping human cortex and integrating the 
findings with previously published datasets, 
we performed a large-scale unbiased transcrip- 
tomic analysis of human cortical development 
throughout the life span. By reconstructing 
single-cell trajectories and identifying genes that 
are expressed in a lineage-specific manner, we 
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Fig. 6. Lineage enrichment of brain-disorder risk genes. (A) Enrichment of 
disease risk genes across developmental stages. (B) Disease risk gene enrichment 
across lineages and lineage branches of neuronal, glial, and vascular cell types. Red 
Squares indicate statistical significance. (C) Enrichment of lineage-specific develop- 


created a compendium of developmental pro- 
grams for all the major cortical cell types. By 
integrating our data with published single-cell 
chromatin accessibility datasets, we identified 
enhancer gene regulatory networks and tran- 
scription factors that are predicted to control 
the commitment and differentiation of specific 
cortical neural lineages. In addition, we char- 
acterized sex- and brain region-specific gene 
programs that are used by particular lineages of 
cortical cell types. We find that female-enriched 
genes are associated with neurodevelopmental 
processes, whereas male-enriched genes are 
involved in protein translation control, sug- 
gesting sex-specific variation of developmental 
trajectories. We also find that developmental 
gene programs used by cortical excitatory neu- 
rons, astrocytes, and oligodendrocytes are the 
most region-specific. Interneurons, in contrast, 
express few region-specific genes during devel- 
opment, consistent with data on regional sig- 
natures of cortical cell types in the mature 
human brain (25). 

We investigated the enrichment of genetic 
risk factors for brain disorders, focusing on ASD, 
and found that the developmental programs of 
both deep-layer intratelencephalic and extrate- 
lencephalic projection neurons are enriched for 
ASD risk genes. These data are in agreement 
with previous reports of enrichment of ASD 
genes in deep-layer cortical neurons during 
mid-gestation (26, 27) but also suggest that 
both deep-layer neurons projecting to other 
cortical areas and to subcortical locations could 
be affected. We previously reported that upper- 
layer cortical excitatory neurons are most 
dysregulated in the cortex of idiopathic ASD 
patients (28). It would be an important future 
direction to elucidate how changes in pan- 
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excitatory neuron programs during develop- 
ment can culminate in dysfunction of specific 
cortical neuronal populations, such as L2-3 
neurons. It would also be valuable to explore 
whether the molecular pathology of upper- 
layer neurons is specific to idiopathic ASD, 
and whether it is driven by common gene var- 
iants rather than by rare variants with strong 
effect sizes (29). In addition, we observed a 
strong enrichment of ASD genetic risk factors 
among female-specific developmental genes. 
As these female-enriched ASD risk genes have 
higher expression in females during cortical 
development, it is possible that this higher 
baseline expression renders the female brain 
more resistant to genetic insults causing au- 
tism, especially to haploinsufficiency that can 
reduce transcript or protein expression by af- 
fecting one of the two alleles. This finding might 
explain the 4:1 male-to-female ratio of individ- 
uals affected by ASD and suggests the impor- 
tance of sexual dimorphism in human brain 
development. However, the role of sex hormones 
in the increased male-to-female ratio in ASD is 
not to be discounted, and additional studies are 
needed to reconcile the role of early develop- 
ment and later sex-specific processes in the 
pathogenesis of autism. Our preliminary find- 
ings indicate the cell type-specific risk of BPD 
and SCZ, but more detailed genetic studies are 
needed to further dissect cell type and develop- 
mental stage vulnerability. The data generated 
here may help enable fine-grained understand- 
ing of human brain development and provide 
insight into mechanisms of neurodevelopmen- 
tal disorders. Interactive visualization of our 
data is available on the University of California 
Santa Cruz (UCSC) Cell Browser (https://pre- 
postnatal-cortex.cells.ucsc.edu). 
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mentally regulated ASD risk genes of different categories and evidence scores. 

(D) Overlap between ASD risk genes and female and male-enriched developmental 
gene programs. (E) Enrichment of sex-specific genes across cortical lineages. 

(F) Temporal patterns of female-enriched genes that are known risk factors for ASD. 


Our study, however, is limited by the tech- 
nical difficulty of integrating snRNA-seq and 
snATAC-seq data as well as by the lack of 
inclusion of earlier developmental stages, such 
as the first trimester, owing to challenges of in- 
tegrating single-cell RNA sequencing (scRNA-seq) 
and snRNA-seq datasets. Overcoming these 
obstacles will allow for even more compre- 
hensive future understanding of how specific 
human cortical lineages develop. Moreover, 
single-cell epigenetic analyses of human brain 
development would be necessary to deter- 
mine whether imprinting plays a role in reg- 
ulating sex enrichment of developmentally 
expressed genes. 


Materials and methods summary 


Brain tissue samples were sectioned using a 
cryostat to collect coronal cortical sections, lysed, 
and ultracentrifuged to isolate nuclei. Nuclei 
were captured using 10x Genomics Single Cell 3’ 
v.2 kits. 

Raw sequencing data were processed using 
10x Genomics CellRanger and aligning reads 
to unspliced human transcriptome to capture 
reads from pre-mRNAs. Dataset integration 
was performed using Harmony based on 10x 
chemistry, and clustering and UMAP embed- 
ding was carried out with Seurat. Monocle 3 
was used to reconstruct lineage trajectories, 
and custom scripts were used to identify lineage- 
specific dynamically expressed genes (see sup- 
plementary materials). 

snATAC-seq data were integrated with 
snRNA-seq data using canonical correla- 
tion analysis in Seurat, after which different 
snATAC-seq chemistries were integrated using 
Harmony. Enhancer gene regulatory networks 
were identified using SCENIC+. 
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INTRODUCTION: The intricate gene-regulatory 
mechanisms governing the diverse cell types 
within the brain are paramount to comprehend- 
ing its functions in both health and disease. The 
recent surge in high-throughput epigenomic 
profiling has heralded groundbreaking revela- 
tions into these gene-regulatory frameworks. 
Particularly, DNA methylation, in conjunction 
with the three-dimensional (3D) chromatin 
formations, underpins the fundamentals of 
gene regulation. This study comprehensively 
analyzed DNA methylation and chromatin con- 
formation in adult human brain cells span- 
ning multiple regions. 


RATIONALE: Gene expression in brain cells is 
modulated by DNA methylation and chroma- 


3 brains, 46 regions 
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tin conformation, and a profound correlation 
exists between these processes. We can better 
understand the human brain’s gene-regulatory 
intricacies by deeply probing these epigenomic 
characteristics in single cells. This study set 
out to meticulously chart the DNA methyl- 
ation landscapes and chromatin structures 
in adult human brain cells. 


RESULTS: The study revealed epigenome-based 
brain cell-type classifications, elucidating di- 
verse categorizations grounded on their epi- 
genomic imprints. Discernible differences in 
the chromatin contact distances between neu- 
rons and non-neurons were discovered. In- 
sight into the 3D genome organization of brain 
cells was gleaned from compartments, domains, 


Single-cell DNA methylation and chromatin conformation profiling 


B compartment 


4 


¢ 


cytosine 


~ 
~ 
~ 
~ 
~ 


element 


A compartment Pa \ Gene body 
' (Un)methylated 


L Cis-regulatory 


Cross-species comparison 


tin accessibility and gene expression. Furt.~ 
there emerged a distinctive cell-type specificity 
in the 3D genome features. The study also 
unearthed patterns of cell-specific DNA methyl- 
ation and its overarching implications on the 
gene-regulatory networks. Regional disparities 
in cortices and basal ganglia were uncovered. 
A comparative exploration underscored the 
conservation of brain cell types and DMRs 
between humans and mice. Finally, the in- 
ception of single-cell methylation barcodes 
(scMCodes) showcased immense promise in 
precisely identifying human brain cell types. 


CONCLUSION: This comprehensive investiga- 
tion presents a single-cell DNA methylation 
and 3D genome structure atlas of the human 
brain. It illuminates the cell-type specificity 
and diverse epigenetic architectures of cells 
across the brain. This epigenomic atlas prom- 
ises to be a valuable resource, fueling further 
discoveries in brain cell diversity, gene regu- 
lation mechanisms, and the genesis of new 
genetic tools. 
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DNA methylation and chromatin conformation profiling in the human brain. DNA methylation and chromatin conformation were probed at single-cell resolution in 
517 thousand cells from 46 regions of three adult human brains. The comprehensive investigation allowed in-depth analysis of cell-type complexity, epigenetic diversity for 
gene regulation, comparison between human and mouse brains, and construction of brain cell-type methylation barcodes. 
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Delineating the gene-regulatory programs underlying complex cell types is fundamental for 
understanding brain function in health and disease. Here, we comprehensively examined human brain 
cell epigenomes by probing DNA methylation and chromatin conformation at single-cell resolution in 
517 thousand cells (399 thousand neurons and 118 thousand non-neurons) from 46 regions of three 
adult male brains. We identified 188 cell types and characterized their molecular signatures. Integrative 
analyses revealed concordant changes in DNA methylation, chromatin accessibility, chromatin 
organization, and gene expression across cell types, cortical areas, and basal ganglia structures. We 
further developed single-cell methylation barcodes that reliably predict brain cell types using the 
methylation status of select genomic sites. This multimodal epigenomic brain cell atlas provides new 
insights into the complexity of cell-type-specific gene regulation in adult human brains. 


igh-throughput epigenomic profiling 
has been used to elucidate the gene- 
regulatory programs underlying tre- 
mendous cellular complexity in brains 
(1-3). 5'-Methylcytosines (5mCs) are the 
most common modified bases in mammalian 
genomes. Most 5mCs in vertebrate genomes 
occur at cytosine-guanine dinucleotides (CpGs). 
CG differentially methylated regions (DMRs) 
are often considered indicative of cis-regulatory 
elements (CREs) (4, 5). In vertebrate neuronal 
systems, however, 5mCs are also abundantly 
detected in non-CG (or CH, H=A, C, or T) con- 
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texts (6). Both CG and CH methylation (mCG 
and mCH) are highly dynamic during brain 
development and show cell-type specificity 
(1, 4, 7). They are also essential for gene regu- 
lation and brain functions (8). In addition, gene 
regulation requires proper three-dimensional 
(3D) conformation of chromatin folding, which 
is organized into active (A) or repressive (B) 
compartments, topologically associating do- 
mains (TADs), and chromatin loops (9). These 
3D structures facilitate the interaction be- 
tween gene promoters and their regulatory 
elements, providing additional yet critical 
layers of regulatory mechanisms. DNA methyl- 
ation and chromatin conformation interplay 
and coordinate in regulating gene expression, 
and these processes are highly correlated (3). 
Surveys on these epigenomic features of brain 
cells can deepen our understanding of gene 
regulation underlying the complexity of hu- 
man brains. Here, we comprehensively pro- 
filed both DNA methylation and chromatin 
conformation in adult human brain cells from 
cortical and subcortical regions using single- 
nucleus epigenomic sequencing technologies. 


Epigenome-based brain cell-type taxonomies 


We dissected 46 brain regions encompassing 
brain structures of the cerebral cortex (CX, 
22 regions), basal forebrain (BF, two regions), 
basal nuclei (BN, 11 regions), hippocampus 
(HIP, five regions), thalamus (THM, two re- 
gions), midbrain (MB, one region), pons (PN, 
one region), and cerebellum (CB, two regions) 
(Fig. 1A, fig. SLA, and tables S1 and S2). Most 
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regions had three biological replicates from 
the three adult male donors (table S3) except 
two amygdala regions (BM and CEN, two repli- 
cates each) (fig. SLA and table S1). Fluorescence- 
activated nuclei sorting (FANS) was used to 
isolate 90% NeuN-positive and 10% NeuN- 
negative cells in each sample (fig. SIA). We 
then used snmC-seg3 (“mC”) (10) to profile 
DNA methylation (DNAm) across all 46 brain 
regions at the single-cell level. Additionally, we 
used snm3C-seq (“m3C”) (3) to simultaneously 
examine single-cell DNA methylation and chro- 
matin conformation from 17 brain regions span- 
ning CX, BF, and BN (Fig. IB and fig. SIA). After 
rigorous quality control, 378,940 mC and 
145,070 m3C nuclei were confirmed suitable 
for further analysis (fig. SIB). Each mC cell 
produced an average of 0.94 million filtered 
reads, and each m3C cell produced ~2.20 mil- 
lion reads with 406 thousand chromatin con- 
tacts. This data quality allowed us to reliably 
measure DNAm across genomic features (fig. 
SIC), identify variable methylation regions, and 
pinpoint TADs and chromatin loops across dif- 
ferent brain cell types. 

Through iterative clustering of the mC 
dataset (see the materials and methods), nuclei 
were first divided into three classes: telen- 
cephalic excitatory neurons, inhibitory and/or 
nontelencephalic neurons, and non-neuronal 
cells (Fig. 1, C and D). These were further 
divided into 40 major types and 188 subtypes 
(Fig. 1D; fig. S2, A to C; and tables S4 and S5). 
The cell types were annotated based on CH- 
hypomethylated gene markers for neuronal 
cells and CG-hypomethylated markers for non- 
neuronal cells (see the materials and methods). 
All major types and subtypes were conserved 
across donors, although there were minor 
variations in the proportions of certain cell 
types (Fig. 1D and fig. S2C). The robust den- 
drograms demonstrated similarities between 
major types and subtypes (Fig. 1D, fig. S2C, 
and materials and methods). Telencephalic 
excitatory and inhibitory and/or nontelence- 
phalic neurons are well separated from non- 
neuronal cells, with each type forming a 
specific clade, except CB and Purkinje (PKJ) cells, 
which were grouped with the non-neuronal 
cell types, likely because of their similar global 
CG- and CH-methylation fractions (Fig. 1H and 
fig. S4A). 

Non-neuronal major types distributed even- 
ly across brain structures, whereas neuronal 
ones exhibited considerable spatial specificity 
(Fig. 1, D and F). Most telencephalic excitatory 
neurons were grouped by location (Fig. 1D). 
Hippocampal excitatory neurons were grouped 
based on their substructures [CA1, CA3, and 
dentate gyrus (DG)]. Cortical excitatory neu- 
rons were clustered by their cortical layers (e.g., 
L2/3) and projection types [e.g., intratelence- 
phalic (IT)-projecting neurons (table S4)]. Basal 
nuclei excitatory neurons, predominantly from 
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the amygdala, form the Amy-Exc group. Telen- 
cephalic inhibitory neurons manifest as 11 major 
types, primarily from cortical areas (Pvalb, 
Pvalb-ChC, Sst, Lamp5, Lamp5-Lhx6, Sncg, 
and Vip) and basal nuclei or basal forebrain 
(MSN-D1, D2, Foxp2, and Chd7). In the thal- 
amus, one excitatory and two inhibitory major 
types were identified. One inhibitory major type, 
THM-MB, shared similar DNA methylation 
profiles with a small population of midbrain 
cells. The other inhibitory major type, THM-Inh, 
was very rare (361 cells or 0.07% of the entire 
dataset), possibly originating from the habe- 
nular nuclei of the thalamus due to dissection 
contamination (fig. S2D). Pontine nucleus neu- 
rons from Pons constituted a specific major 
type (PN). The cerebellum contained two dis- 
tinct major types: the rare cell type PKJ (867 
cells or 0.17%) and cerebellar granule cells (CB). 
Finally, the SubCtx-Cplx major type, found in 
the basal nuclei and midbrain, was notable 
for its heterogeneity: Its subtypes consisted of 
both excitatory and inhibitory cells (Fig. 1E) 
and featured highly variable DNAm of the genes 
of neurotransmitter receptors, transporters, and 
neuropeptides (fig. S2E). 

The cell types determined from single-nucleus 
DNAm profiles were corroborated with single- 
nucleus transcriptome sequencing (snRNA-seq) 
and single-nucleus chromatin accessibility se- 
quencing (snATAC-seq) data from the same 
human brains [see the materials and methods 
and companion manuscripts Siletti et al. (11) 
and Li et al. (12)]. Integrative analysis revealed 
the strong correspondence between cell types 
determined using different molecular modal- 
ities (fig. S3A). All epigenome-based cell sub- 
types correspond well with transcriptome-based 
clusters (fig. S3B), although the transcriptome- 
based clusters were derived from ~10 times more 
cells and from ~2 times more brain regions. 

Global methylation varied among major 
types: 77.7 to 85.5% for mCG and 0.8 to 10.7% 
for mCH. Non-neuronal and granule cell (DG 
and CB) major types had the lowest global 
fractions in both mCG and mCH (Fig. 1H and 
fig. S4A), consistent with the previous study in 
mice (J). Cortical inhibitory neurons had the 
highest mCG, whereas certain nontelence- 
phalic neurons from the thalamus, midbrain, 
and pons exhibited the highest mCH (Fig. 1H 
and fig. S4A). Cell-type global methylation cor- 
responded with the gene expression of DNAm 
readers and modifiers (Fig. 1] and fig. S4B). 
The expression of MECP2 and DNMT3A, the 
major mCH reader and writer, were posi- 
tively correlated with global mCH [Pearson 
correlation coefficient (PCC) = 0.39 and 0.35, 
respectively] and weakly with mCG (PCC = 
0.17 and 0.08, respectively) (Fig. 11 and fig. S4B). 
The DNA methyltransferase DNM77 had a high 
positive correlation (PCC = 0.63) between its 
expression and mCG across cell types (Fig. 11), 
matching its role as the major mCG maintainer 
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in mature neurons (13). We observed an even 
higher correlation between DNMTI expression 
and mCH (PCC = 0.72; Fig. 11), although it is 
thought to have little effect on mCH (/4). This 
implied an unknown relationship between 
DNMTI1 and mCH or some yet-to-be-discovered 
factor influencing both DNMTI expression 
and mCH. 

Using the improved scHiCluster (15) for m3C 
cells, we were able to separate all major types 
except MSN-D1 and D2 solely through chroma- 
tin contacts (Fig. 1G). This also highlighted the 
diversity of chromatin conformation across brain 
regions (fig. S2F). To ensure the consistency of 
annotations between the two datasets, we co- 
clustered mC and m3C cells iteratively and then 
transferred cell-type annotations from mC to 
m3C cells (table S5 and materials and methods). 


Differences in contact distance between 
neurons and non-neurons 


To investigate cell-type-specific genome fold- 
ing at different scales, we first examined the 
proportion of contacts per cell at genome dis- 
tances. Neurons displayed enrichment of in- 
teractions at a shorter distance (200 kb to 2 Mb), 
whereas mature oligodendrocytes and non- 
neural cells were enriched for longer-range 
contacts (20 Mb to 50 Mb). Astrocyte and oli- 
godendrocyte progenitor cells exhibited enrich- 
ment in both ranges (Fig. 2, A to C, and fig. S5, 
A and B). Within neuronal cells, cortical ex- 
citatory and subcortical neurons had more 
shorter-range interactions than cortical inhib- 
itory cells (P < 1 x 10°°°°, Wilcoxon rank-sum 
test; Fig. 2, A and B). We observed similar 
patterns in previous datasets from the mouse 
(7) and from a different technique [Dip-C (6); 
fig. S5C], signifying the conservation of these 
patterns. The enrichment of shorter-range 
contacts in neurons was observed across the 
whole genome, including both neuronal and 
non-neuronal gene loci (fig. S5D). The ratio 
between shorter and longer interactions highly 
correlated with global gene expression activity 
of cells (PCC = 0.87; fig. SSE) and aligned with 
the sizes of nuclei [L5-ET > other cortical excit- 
atory neurons > cortical inhibitory neurons > 
non-neurons (/7)]. These results demonstrated 
that the contact distance spectrum, tradition- 
ally associated with cell cycle phases (18), can 
also vary based on cell type in nondividing cells. 

We next investigated the relationship be- 
tween enriched longer-range or shorter-range 
chromatin interactions and chromatin com- 
partments or domains. We identified chromo- 
some compartments within each major type 
at 100-kb resolution (Fig. 2D) and domains at 
25-kb resolution. Enriched longer-range inter- 
actions in non-neurons were predominantly 
intracompartmental, especially between B com- 
partment regions. Shorter-range interactions 
in neurons were also enriched within the same 
compartments (fig. S5, F and G). In total, we 
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observed an enrichment of intracompartment 
interactions and a depletion of intercompart- 
ment interactions in non-neurons (fig. S5, H to 
K, and materials and methods), indicating a 
stronger compartment strength. By contrast, 
the enrichment of short-range interactions in 
neurons was found to be both intradomain 
and interdomain (fig. S5, L and M). 


Compartments, domains, and loops in brain 
cell types 


We postulated that the methylation status of 
two genome loci would covary if they were 
physically proximate. The comethylation coef- 
ficient matrices, depicting the correlation of 
methylation between genomic bins across sin- 
gle cells, displayed plaid patterns echoing the 
compartment structures of chromatin contacts 
(Fig. 2E and fig. S6A). This suggested that the 
genome was segregated into local comethyla- 
tion domains, which constituted two sets with 
opposite methylation diversities. A similar co- 
regulation structure was also observed for 
chromatin accessibility in the scATAC-seq 
data (19), reinforcing evidence for genome com- 
partmentalization. Exploring the link between 
DNA methylation and 3D genome architec- 
ture, we observed that correlations between 
the strengths of chromatin interactions and 
the average methylation fractions of their an- 
chors were also associated with chromosome 
compartments (fig. S6B), where negative cor- 
relations occurred more frequently in the ac- 
tive compartment (P < 1 x 10 °™: fig. S6C). 

We then determined domains at 25-kb reso- 
lution in single cells and found that neurons 
had more domains (median 4813) than non- 
neurons (median 4308) (P < 1 x 10°") but 
with smaller average size, resulting in a simi- 
lar domain-covered genome proportion (fig. 
S6, D and E). The number and size of domains 
were highly correlated with global gene ex- 
pression activity (fig. S6F). The boundary prob- 
ability of a genomic bin was defined as the 
frequency with which it was identified as a 
domain boundary across cells, which mirrored 
the insulation scores from the cell-type pseudo- 
bulk contact maps (Fig. 2, F and G). 

Chromatin loops were delineated at 10-kb 
resolution in each of the 29 major types (and 
119 cell subtypes) with =100 m3C cells. We de- 
tected a median of 524,935 (541,551) loop pixels 
with 45,140 (59,905) loop summits among 
major types (subtypes) (fig. S6G). Of these, 
24.3% were interactions between distal DMRs 
(see later section for a systematic description 
of DMRs) and gene promoters [transcription 
start site (TSS) + 2 kbp], 38.1% between distal 
DMBs, and 5.8% between promoters (fig. S6H). 


Cell-type specificity of 3D genome features 


Using compartment scores, domain boundary 
probabilities, or loop strengths, we were able to 
distinguish between cell types and determine 
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Fig. 2. Diversity of 3D genome structures across major types. (A) Frequency 
of contacts against genomic distance in each single cell, Z-score normalized 
within each cell (column). The cells are grouped by major type and then ordered 
by the median log2 short/long ratio over cells. The y axis is binned at log2 scale. 
(B) Log2 short/long ratio of major types, ordered the same as in (A). (C) Imputed 
contact maps of four major types. (D) Heatmaps show the correlation matrices 
of distance normalized contact maps in (C), and line plots show PC1 of the 
correlation matrices. (E) Magnified view of two matrices in (D) and the 
corresponding correlation matrices of mCG across cells. (F and G) Imputed 
contact matrices (heatmap), boundary probabilities (blue lines), insulation 
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scores (orange lines), differential boundaries (red dots in line plots), and 
differential loops (cyan dots in heatmaps) of excitatory IT neurons at FOXP2 
locus (a marker of cell type L4-IT) (F) or CGE-derived inhibitory neurons at 
LAMP5 locus (a marker of Lamp5 and Lamp5-Lhx6) (G). Gray shade represents 
the gene body (TSS to TES). (H) t-SNE plot of cells (n = 5707) using domains 
(top) or loops (bottom) as features, colored by major types. (1) PCC between 
compartment score, boundary probability, or loop interaction strength and 
ATAC signals, mCG and mCH fractions of the bin(s) across all major types for all 
genes (left) or top DEGs only (right). (J) PCC between compartment scores, 
boundary probabilities, or loop interaction strength, and gene expression across 


4: of 20 


SPECIAL SECTION 


all major types for different categories of overlap (x axis) using all genes (left) or 
top DEGs (right). (K and L) Proportion of significantly positively or negatively 
correlated compartment (K) or domain boundary (L) out of all the bins located at 
different positions relative to a gene, average across the top neuronal DEGs. 
(M) Proportion of significantly correlated loop pixels out of all the loop pixels 
(left), ratio between positively and negatively correlated loop pixels (middle), or 


the hierarchy of their similarities (Fig. 2H; fig. 
S7, A to C; and materials and methods), indi- 
cating cell-type specificities of these 3D struc- 
tures. In particular, chromosome compartments 
could distinguish non-neurons, excitatory, inhib- 
itory, and MSN neurons, but had difficulty 
for finer major types within the excitatory or 
inhibitory cell classes (fig. S7, B and C). By 
contrast, both chromatin domains and loops 
distinguished better for finer excitatory and 
inhibitory major types, and loops performed 
the best (Fig. 2H and fig. S7, B and C). This 
underscores the varying roles of different scales 
of 3D features in gene regulation across cell- 
type granularities, highlighting that loops could 
be more specific than domains. The primary 
goal of these analyses was to contrast compart- 
ments, domains, and loops in cell-type specific- 
ity, but not for cell-type clustering. The state-of- 
the-art of cell clustering on chromatin contacts 
is still based on the genomic bin pairs, as 
adopted by us (J5) and other groups (20, 27) 
(fig. S7, B and C, and materials and methods). 

Systematic examination on specific 3D struc- 
tures across all (or neuronal) major types de- 
termined 1188 (1024) differential compartments 
(DCs), 2050 (1720) differential domain bound- 
aries (DBs), and 173,806 (148,395) differential 
loops (DLs) (fig. S8A and table S6). Chromatin 
domains were considered conserved across 
cell types in general (22-25), whereas they could 
display certain dynamics across cell types and 
development (3, 26-28). Our data further showed 
that chromatin domains could vary even be- 
tween closely related cell types (Fig. 2, F and 
G). DMR-DMR loops showed higher cell-type 
specificity than promoter-DMR or promoter- 
promoter loops (fig. S8B). Evaluating tran- 
scription factors (TFs) in differential chromatin 
looping, we found the motifs of cell-type- 
specific TFs (e.g., NFIX and NHLH1) were more 
enriched at anchors of DLs, whereas CTCF, a 
TF pivotal for chromosome structure, was high- 
ly enriched at housekeeping loops (fig. S8C). 
This implied that CTCF’s role is more in struc- 
tural loops than in cell-type-specific promoter- 
enhancer interactions. Many neuronal TFs (e.g., 
NEUROGI and NEUROG2) were enriched at 
the pan-neuronal loops but not pan-brain-cell 
loops (fig. S8C), concordant with their neuron- 
specific roles. 


Relationship between genome organization 
and other molecular modalities 


We examined the link between different 3D 
structural features and other epigenomic mo- 
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dalities (mCG, mCH, and open chromatin). 
Across neuronal cell types, both mCG and 
mCH were anticorrelated with compartment 
scores, domain boundary probabilities, and 
loop strength (Fig. 2I and figs. S9, B and D, 
and S10, C and D). By contrast, open chroma- 
tin signals exhibited positive correlations with 
these structural features with similar or slight- 
ly weaker (absolute) correlations (Fig. 2I and 
figs. S9, B and D, and S10, C and D). These 
(anti-)correlations suggest orchestration among 
active compartments, strong domains and 
loop interactions, as well as open chromatin 
and methylation depletion corresponding to 
active chromatin states. Between the differen- 
tial structural features (DCs, DBs, and DLs) 
across cell types, DLs had stronger (anti-) 
correlations with mCG, mCH, and open chro- 
matin compared with DCs and DBs (fig. S10C), 
particularly at the loops with high variability 
across cell types (fig. SIOD). Correlations across 
all cell types were generally weaker than in 
neurons alone (Fig. 2I and figs. S9 and 10). 
The anticorrelation observed between DNAm 
and 3D genome structures could have resulted 
from the effect of DNAm on the binding of 
factors driving genome folding (e.g., CTCF) 
(29), the recruitment or exclusion of methyla- 
tion writers or erasers (e.g., DNMTs and TETs) 
through high-order structural formation or 
shared regulators of both methylation and ge- 
nome organization [e.g., Neurog2 in mouse 
cortex (30)]. Further developmental or mech- 
anistic studies are needed to resolve the cau- 
sality relationship (29, 31). 

Gene expression was correlated with the 3D 
genome structures as well, particularly for the 
cell-type-specific genes (Fig. 2J). We identified 
1099 (1358) top differentially expressed genes 
(DEGs) pairwisely across neuronal (all) major 
types. They exhibited strong positive correla- 
tions with all three structural features that 
overlapped with their gene bodies or promo- 
ters (Fig. 2J and figs. S11 to 13). For loops, the 
interaction strengths were more correlated 
with anchor-overlapped DEGs (on gene bodies 
or promoters) compared with the anchor- 
encompassed DEGs (P < 1 x 10-2"; Fig. 2J). 
The increasing variability of gene expression 
and/or structural signals of bins was linked 
to higher positive correlations between them, 
which corroborates the overlap between differ- 
ential structural signatures and differential gene 
expression (figs. S11, E and F, and S12, E and F). 

We further examined the relative location 
between 1099 neuronal DEGs and their corre- 
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average PCC of significantly correlated loop pixels (right) located at different 
positions relative to a gene, average across the top neuronal DEGs. (N) Number 
of genes, out of the top neuronal DEGs, having significantly positively correlated 
compartments, domain boundaries overlapping the gene body, or loop pixels 
within the gene body or with at least one anchor overlaps the TSS or TES of the 
gene. Thirty-five genes were not included in any of the three circles. 


lated chromatin structures [false discovery rate 
(FDR) < 0.01; see the materials and methods] 
at surrounding regions [TSS-5Mb to transcrip- 
tion end site+5Mb (TES+5Mb)]. The correlated 
compartments were mostly within the gene 
body (Fig. 2K), and the correlated domain bound- 
aries were highly enriched at the TSS and TES 
(Fig. 2L), suggesting that the dynamics of gene 
body compartments and domains associated 
with gene expression diversity. The loops with 
positive correlations were enriched within gene 
bodies, as well as between the TSS and TES and 
the gene body + 1 Mb regions (Fig. 2M). Spe- 
cifically, 48% of the loops within the gene body 
were correlated with gene expression, among 
which 98% were positively correlated. By com- 
parison, a much smaller proportion of loops 
outside of the gene body were correlated with 
expression. A higher proportion of positively 
correlated loops were observed within the 
upstream and downstream regions of the DEGs 
and between the upstream or downstream re- 
gions and the gene body regions, indicating 
the regulatory domain of a gene structure. 
Among the 1099 DEGs, 453 (41.2%) had gene 
bodies overlapped by one or more genomic 
bins with positively correlated compartment 
scores, and 591 (53.8%) overlapped by one or 
more correlated domain boundaries. A total of 
1037 (94.4%) DEGs had TSS- or TES-anchored 
correlated loops, and 898 (81.8%) had corre- 
lated loops within gene bodies. These dynam- 
ics of chromatin architecture at different scales 
in total covered 96.8% of the DEGs (Fig. 2N), 
again suggesting a strong association between 
genome structures and gene expression diver- 
sity. Collectively, these analyses revealed the 
cell-type specificity of chromatin architecture 
and its relationship with other epigenomic and 
transcriptomic signatures at an unprecedented 
cell-type resolution in the human brain. 


Cell-type-specific DNA methylation patterns 
and associated gene-regulatory landscapes 


To delineate the cell-type-specific methylation 
profiles, we identified 24,455 CH- and 13,096 
CG- differentially methylated genes (DMGs) 
(fig. SI4A and materials and methods) and 
2,059,466 CG-DMRs (Fig. 3A and materials and 
methods) across 188 brain cell subtypes. In ad- 
dition to depicting distinct epigenetic signa- 
tures for brain cell identities, these methylation 
patterns provide critical insight into under- 
standing gene-regulatory programs in brain 
cells, with gene body methylation negatively 
correlating with gene expression (5, 7, 32), DMRs 
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marking putative CREs (4, 5), and TF motifs 
implicating candidate cell-type-specific regu- 
lators (32). 

We assigned TFs to specific cell types if 
they were hypomethylated DMGs (fig. S14B 
and materials and methods) and their motifs 
were enriched at the hypomethylated DMRs 
(hypo-DMRs) in the same cell types (see the 
materials and methods). In total, 612 TFs were 
assigned to major neuronal types and subtypes, 
where they potentially play important roles in 
shaping and maintaining cell identities. For ex- 
ample, TBR1 was assigned to deep-layer exci- 
tatory neurons, particularly L6-CT and L6b (Fig. 
3B), and it was noted to play a fate-determining 
role in the development of corticofugal projec- 
tion neurons (33). ZNF423 and EBF2 were both 
assigned to the cerebellar cell types (Fig. 3B). 
Both of them are crucial for cerebellum devel- 
opment, whereas EBF2 particularly directs the 
migration of PKJ cells (34-36). 

Analyzing subtypes further highlighted var- 
iations in TF utilization. For instance, the TF 
PBX3, assigned to the MSN-D1 major type prev- 
alent in the striatum, was only hypomethylated 
in the subtypes from the striosome compart- 
ment, not the matrix compartment of the stri- 
atum (Fig. 3B and fig. S14C). This indicates a 
preference for PBX3 expression in the striosome, 
corroborating previous observations (37, 38). 
Further examination of potential binding 
sites of PBX3 (hypo-DMRs with PBX3 motifs) 
showed lower average methylation fractions 
in striosome subtypes (Fig. 3B), suggesting a 
compartment-specific regulatory role of this 
TF in the striatum. 

We integrated DMGs, DMRs, and differen- 
tial loops to pinpoint putative CREs for each 
cell type (Fig. 3C). A gene was associated with 
a DMR if its TSS was within 5 Mb of the DMR. 
Further refinement retained only DMR-DMG 
pairs overlapping with both anchors of a loop 
or DL. Pearson correlations between mCG frac- 
tions of DMRs and mCH fractions of gene 
bodies across cell subtypes were calculated to 
assess the association (fig. S14D). Enhanced 
associations were observed particularly for DL- 
filtered DMRs (Fig. 3D and fig. S14D), which 
showed an increased overlap with open chro- 
matin regions as well (Fig. 3E). We identified 
3.2 million potential regulatory DMR and gene 
pairs between 1,122,919 DMRs and 12,327 genes 
(table S7). The methylation fractions of these 
DMRs, DMGs, and the strengths of their in- 
teractions (loops) were (anti-)correlated (Fig. 
3F), which could collectively orchestrate spe- 
cific gene-regulatory programs. For instance, 
the gene SYT77, encoding Synaptotagmin-l, a 
critical synaptic vesicle protein, exhibited lower 
methylation fractions of both the distal DMRs 
and the SYT/ gene body in L2/3-IT neurons and 
stronger interactions between the DMRs and 
the promoter compared with MSN-D1 neurons 
(Fig. 3G), leading to a higher expression of SYT 
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in L2/3-IT neurons than in MSN-D1 neurons 
(Fig. 3G and fig. S14E). Overall, the integration 
of CG and CH methylation with chromatin 
conformation revealed distinct cell-type reg- 
ulatory dynamics. 

Numerous noncoding loci linked to brain 
diseases have been pinpointed by genome-wide 
association studies (GWASs), with many in 
enhancer regions (39). DMRs and loops help 
to localize these genetic variants to specific 
cell-type regulatory elements. Using linkage 
disequilibrium score regression (LDSC) (40), 
we detected associations between 20 brain dis- 
eases or traits and DMRs or loop-overlapping 
DMBs in human brain cells (Fig. 3H, fig. S15A, 
and materials and methods). Schizophrenia, 
bipolar disorder, and neuroticism risk variants 
were prominently enriched in hypo-DMRs of 
excitatory neurons in the cortices and HIP, 
whereas Alzheimer’s disease aligned with mi- 
croglia [MGC; Fig. 3H (42)]. Tobacco usage dis- 
order variants associated with the Foxp2 cell 
type from the basal ganglia (Fig. 3H), an area 
linked to tobacco addiction (42). Further explo- 
ration into disease risk variants revealed diverse 
impacts on gene regulations. Although many 
cell types are related to the same diseases, the 
risk variants that are implicated could be di- 
verse. For example, the schizophrenia risk var- 
jant rs2789588 was implicated in both L2/3-IT 
and L6-CT neurons with similar epigenetic fea- 
tures, whereas rs17194490 was only implicated 
in L2/3-IT with specific DNA hypomethylation, 
stronger long-range interaction with the corre- 
sponding gene, and higher gene expression com- 
pared with L6-CT (fig S15B). 


Regional heterogeneity in cortices and 
basal ganglia 


Beyond cell-type diversity, heterogeneity within 
shared cell types across regions has been noted 
in the neocortex in both gene expression (43-45) 
and DNA methylation (7). Our extensive epi- 
genomic dataset further explores gene reg- 
ulation heterogeneity across broader cortical 
regions and subcortical regions. To discern 
regional diversity from other cell-type hetero- 
geneities, we devised a workflow to unveil the 
regional landscape within single-nuclei DNA 
methylation profiles (Fig. 4A). Integrating these 
profiles with brain region data, we mapped 
the cells to a “regional methylation space” (Fig. 
4A and materials and methods), where cells 
closer together have methylation neighbors 
from similar brain regions. In this regional 
methylation space (Fig. 4A and materials and 
methods), trajectories depict regional transi- 
tions alongside associated methylome shifts, 
thereby enhancing our grasp of regional DNA 
methylation effects. 

Cortical excitatory neurons exhibited marked 
regional diversity in methylation, particularly 
the LX-IT neurons (Fig. 4B). The regional di- 
versity of cortical inhibitory neurons (46) was 
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less studied due to their inconspicuous re- 
gional patterns in transcriptome and epige- 
nome (J, 47, 48). Our analysis reveals regional, 
although less pronounced, distinctions among 
cortical inhibitory neurons (Fig. 4B). Regional 
axes of each cortical neuronal cell type were 
constructed through single-cell trajectory analy- 
sis (49). We observed a shared ordering of 
brain regions on the axes among cortical neu- 
rons, from the posterior regions of the brain 
[e.g., primary visual cortex (V1C)] to the ante- 
rior lateral regions [e.g., prefrontal cortex (A46) 
and middel temporal gyrus (MTG) ] and then to 
the anterior medial regions [e.g., anterior cin- 
gulate cortex (ACC) and lateral entorhinal 
cortex (LEC); Fig. 4, C and D]. Only L6-CT 
showed an exceptional pattern (Fig. 4, B and C) 
from this posterior-lateroanterior-medioanterior 
trend. Nevertheless, the shared trend allowed 
for further analysis of a consensus regional axis 
for cortical neurons (Fig. 4C and materials and 
methods). 

Epigenetic alterations along this axis sug- 
gest regional specification of CX. For instance, 
the TF NR2F1 (also known as COUP-TFI) has 
gradient expression during brain development, 
which is vital for establishing the caudal-rostral 
regional specialization in the neocortex (43) 
and the boundary between the neocortex and 
the entorhinal cortex (50). Our data showed 
low gene body methylation in VIC (P) and LEC 
(MA) and high methylation in A46 (LA; Fig. 
4G and fig. SI6B), accompanied by a reversed 
trend of gene expression (fig. SI6A). Two chro- 
matin domains associated with NR2F] showed 
interaction strengths changing in the opposite 
direction (Fig. 4F). In V1C, the upstream do- 
main interacted more with NR2F7s promoter 
and had hypo-DMRs compared with LEC. By 
contrast, the downstream domain displayed 
a stronger interaction with NR2F?s promoter 
and featured DMRs hypomethylated in LEC 
(Fig. 4, F and G). Two neighbor genes, NR2F1- 
ASI and FAM172A, encompassed in these two 
domains, respectively, showed concordant ex- 
pression trends with the domain strengths (fig. 
S16A). Such coherent variations in epigenetics 
and transcription imply regulatory domain 
switching and alternative CRE usage to acti- 
vate the same gene in different cortical regions, 
which needs further investigation. 

Systematic examination of regionally differ- 
ential epigenetic features in cortical neurons 
determined in total 14,606 (average 2.9 thou- 
sand for each major type) regional DMGs 
(rDMGs), 885.4 thousand (63.2 thousand) re- 
gional DMRs (rDMBs), 773 thousand (71.2 thou- 
sand) regional differential loops (rDLs), and 
1495 (136) regional differential domain bound- 
aries (rDBs) (fig. SI6B and materials and me- 
thods). Many rDMGs and rDMRs showed 
monotonic methylation gradients along 
the posterior-lateroanterior-medioanterior 
axis (Fig. 4E and fig. S16, C and D), whereas 
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more complex patterns (such as NR2FI) also 
existed. 

Basal ganglia neurons exhibited substantial 
regional diversity as well. A lateral-dorsal- 
ventral axis became evident in the basal gan- 
glia (Fig. 4H), with accompanying epigenetic 
shifts. For instance, moving from NAC through 
CaB to Pu, the LSAMP gene increased in mCH 
(Fig. 4, I and J) and decreased in strengths of 
chromatin domains and loops around it (Fig. 4K). 
We determined 6371 rDMGs and 398.8 thou- 
sand rDMRs in the four major types of basal 
ganglia (MSN-D1, MSN-D2, Foxp2 and Chd7; 
fig. SI6B), and 98,276 (50,271) rDLs and 193 
(99) rDBs (fig. SI6B) in MSN-D1 and MSN-D2 
cells. Most rDMGs and rDMRs showed strong 
(anti-)correlations with the lateral-dorsal- 
ventral axis (fig. S16, C to F), highlighting re- 
gional variation as a key to basal ganglia 
within-cell-type heterogeneity. Distinctions in 
both functions and neural connections be- 
tween the dorsal (CaB and Pu) and the ven- 
tral parts of basal ganglia, particularly its major 
component striatum, have been noted previ- 
ously (57, 52). Our data and analysis provided 
the epigenetic basis of the dorsal-ventral dif- 
ferences and refined the regional differences 
within the dorsal basal ganglia (Fig. 4H). 

A considerable amount (427 of 746) of TF 
motifs were enriched in rDMRs (fig S16G and 
materials and methods). Approximately 47% 
of these TFs were expressed in the correspond- 
ing cell types (fig. SIGH), with expression (anti-) 
correlated with the regional axes (e.g., fig. 
S161). These findings hint at potential region- 
specific regulatory mechanisms in the brain, 
possibly underlying functional diversities. 


Conservation of brain cell types and DMRs 
between humans and mice 


Brain cell-type conservation between primates 
and rodents was noted in several neocortical 
regions (17, 53). To assess whether the conser- 
vation holds in broader brain regions, we 
compared the single-nucleus DNA methylation 
profiles from human and mouse (J), using cor- 
responding regions including the CX, BF, BN, 
and HIP (see the materials and methods). The 
integration analysis showed that three major 
types defined in human brains were discrep- 
ant with mouse brain cells (Fig. 5A). Mouse 
14-IT neurons aligned only to subpopulations 
of their human counterparts (Fig. 5B), con- 
firming a larger heterogeneity in human L4-IT 
neurons (17). The human HIP-Miscl neurons 
were integrated with some mouse cortical IT 
neurons, and HIP-Misc2 neurons did not match 
any mouse cell type. The parallel snRNA-seq 
dataset (17) validated these two human HIP 
cell types (Fig. 5C and fig. S17B). Although the 
unmatched cell types will need further inves- 
tigation, the major type taxonomies were gen- 
erally conserved across broader brain regions 
between humans and mice (Fig. 5A and fig. 
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S17A), whereas both global CG and CH methyl- 
ation were consistently higher in humans than 
in mice for the corresponding cell types (Fig. 5D 
and fig. S17C). 

To compare the gene regulation between 
human and mouse brains, we used liftOver to 
match major type hypo-DMRs identified with- 
in single species (Fig. 5E). Approximately 40 to 
60% of hypo-DMRs across cell types had ortho- 
log sequences in the other species (which we 
refer to as OrthSeqs). About half of OrthSeqs 
had their orthologs also hypo-DMRs in the other 
species (OrthDMRs). Most (95%) of OrthDMRs 
were reciprocally matchable (CnsvDMRs; 
Fig. 5F and fig. S17D). Methylation fractions 
of CnsvDMRs showed substantial correlations 
across cell types between human and mouse 
(Fig. 5, G and H), suggesting functional con- 
servation between species. 

We further selected the most highly cor- 
related DMRs (hcCnsvDMRs, Fig. 5G). Func- 
tional enrichment analysis of heCnsvDMRs 
showed that they were enriched in biological 
processes related to forebrain development 
and in cellular components related to den- 
drites and synapses (fig. S17, F and G, and 
materials and methods). Comparison with 
histone modifications in mouse forebrains (4) 
demonstrated that these DMRs were depleted 
from heterochromatic regions (H3K9me3) 
and enriched in regions of enhancers (H3K27ac 
and H3K4mel1), promoters (H3K4me3), and 
poised enhancers (H3K27me3; fig. S17E and 
materials and methods). Categorizing the 
hcCnsvDMRs further into open or closed status 
based on their chromatin accessibility (2) 
showed that open DMRs were enriched in en- 
hancers and promoters. By contrast, closed 
DMRs were particularly enriched in the poised 
enhancers (Fig. 51), which had probably been 
active during development. 

Methylation conservation between species 
hints at a strategy for enhancer discovery 
through comparative epigenetics. For exam- 
ple, INPP5SJ, a specific gene of Pvalb neurons, 
had many distal and proximal heCnsvDMRs 
overlapping with matched chromatin-accessible 
regions (Fig. 5J), including two that were val- 
idated as specific enhancers for viral targeting 
of mouse Pvalb neurons (Fig. 5J) (54). 


Single-cell methylation barcodes reliably 
predict human brain cell identity 


DNA methylation variation in the genomes of 
cells contains molecular “engrams” represent- 
ing past and present gene-regulatory events (55). 
We observed distinct DNA methylation patterns 
on many CpG sites highly specific to brain cell 
types (e.g., fig. SI8B). This led us to devise single- 
cell methylation barcodes (scMCodes) to de- 
termine brain cell types at the single-cell level 
using the methylation status of selected CpG 
sites (Fig. 6A, fig. SI8A and materials and 
methods). 
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We first selected CpG sites distinguishing 
brain cell types iteratively (see the materials 
and methods). These sites were further clus- 
tered into 39 thousand groups according to 
their across-cell-type methylation patterns. 
We then assessed their cell-type predicting 
power through machine learning models with 
cross-validation (fig. SI8A and materials and 
methods). Eight hundred groups with a total 
of 12 thousand CpG sites were selected as the 
scMCodes (Fig. 6, B and C, and tables S8 and 
S9) to achieve good predicting power (Fig. 6D) 
while minimizing feature number (fig. S18C). 
These scMCodes achieved ~93% accuracy (Fig. 
6D and materials and methods). 

Cross-donor tests were conducted among 
the three donors of this study and an external 
individual (5). The results showed high pre- 
diction accuracies (~92 to 96%; Fig. 6E), dem- 
onstrating the cross-individual robustness of 
the scMCode approach. Single-cell sequenc- 
ing has limited genomic coverage. On average, 
only ~200 CpG sites of scMCodes were de- 
tected in each cell (Fig. 6F), which underscores 
the effectiveness of scMCode in determining 
human brain cell types using a few hundred 
select methylation sites. 


Discussion 


A profound understanding of cellular diversity 
and distinctive gene-regulatory mechanisms 
in the human brain is pivotal for elucidating 
brain functions and formulating therapeutics 
for brain disorders. We have compiled a com- 
prehensive single-cell DNA methylation and 
3D genome structure atlas of human brains 
with 524,010 deeply sequenced nuclei from 
46 distinct brain regions, permitting us to iden- 
tify 188 epigenetically distinct cell types. The 
extensive profiling of brain regions in this 
study has allowed us to identify cell types 
specific to subcortical regions and to compare 
epigenetic diversity within the same cell type 
across different brain regions, which consid- 
erably expands previous work (3, 5, 56). Addi- 
tionally, we have made considerable strides in 
understanding the 3D genome diversity across 
brain cell types and regions, facilitated by a 
30-fold increase in cell profiling using snm3C- 
seq. Moreover, the specificity of domains and 
loops across 29 cell types was determined, 
pushing the cell-type resolution extensively 
beyond previous studies (28, 57-59). 
Single-nucleotide-resolution DNAm has prov- 
en valuable in predicting epigenetic age (60), 
tracing cell lineage (61, 62), and diagnosing 
life-threatening diseases (63, 64). The intricate 
regulatory information encoded in DNAm has 
enabled us to distill a set of secMCodes for 
reliable cell-type identification. Given that 
circulating-free DNA methylation has been rec- 
ognized as a robust tool for cancer diagnosis 
(58) and has provided promising biomarkers 
for brain disorders (59), our scMCode method 
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Fig. 5. Cross-species comparison between human and mouse brain cell 
methylomes. (A) Integration of single-cell methylomes between human and mouse 
brains visualized using 2D t-SNE. (B) Discrepancy between cell types of human 
and mouse brains in cell types L4-IT, HIP-Miscl, and HIP-Misc2. (C) CH 
hypomethylation and gene expression of TF TSHZ2 in the cell types HIP-Miscl 

and HIP-Misc2. (D) Correlated global mCH and mCG of conserved cell types between 
human and mouse. (E) Schematic of cross-species matching of cell-type DMRs. 


(F) Overall, ~50% of DMRs have orthologous sequences in the other species, among 
which ~25% are reciprocal DMRs. (G) Distribution of cross-species correlation 

of DMR methylations (red) and the randomly shuffled background (black). 

(H) Examples of methylation fractions of hcCnsvDMRs. (I) Enrichment of the 
hcCnsvDMRs in the histone modification marks. (J) Browser view of hcCnsvDMRs 
around gene /NPP5SJ in major type Pvalb. The regions colored by red are the 
cell-type-specific distal enhancers validated in (54). 


is a potentially transformative tool for the non- 
invasive diagnosis of brain disorders. It could 
aid in pinpointing pathological brain cell types 
and inform treatment selection, marking a stride 
forward in precision medicine. 
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However, several avenues warrant further ex- 
ploration. First, beyond the 46 brain regions 
sampled in this study, the human brain has 
other intricate structures with complex cell 
diversity, particularly in subcortical regions. A 
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more comprehensive brain region sampling 
beyond what was available for this study would 
provide deeper insights into the underlying 
gene regulation complexities. Second, the avail- 
ability of high-quality tissues restricted us to 
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Fig. 6. snMCodes for brain cell types. (A) Workflow of deriving snMCodes. (B) snMCodes derived from 
all three donors. (C) Examples of cell-type specificity of snMCode features. (D) Heatmap showing 
confusion matrix of snMCodes in predicting cell types. (E) Cell-type prediction accuracy in cross-donor test. 
(F) snMCodes predict human cell types with a limited number of CpG sites at single-cell resolution. 


only three male donors. Although this satisfied 
the purpose of this study for surveying human 
brain cell types and revealing their epigenomic 
patterns, expanding the donor base would fur- 
ther elucidate individual variations of brain cells 
alongside the genetic impact on gene regulatory 
diversity. Finally, our findings largely stem 
from molecular modality correlations. Verify- 
ing these associations is imperative for delin- 
eating the functionality of regulatory elements, 
mapping regulatory networks, and harnessing 
putative enhancers for cell subtype studies. 

Overall, this multimodal human brain cell 
atlas enriches our understanding of brain cells 
with a foundational epigenomic perspective. It 
offers not only an invaluable resource for ex- 
ploring cell-type diversity, gene regulation 
complexity, regional variation, and evolution- 
ary conservation within brain cells, but also 
provides the essential elements, such as puta- 
tive regulatory elements, for the development 
of innovative genetic tools for cell-type-specific 
targeting. 


Tian et al., Science 382, eadf5357 (2023) 


Materials and Methods 
Human postmortem tissue specimen screening 
These studies were intended to be the first ex- 
plorations of cellular, transcriptional and epi- 
genomic variation across the human brain using 
the latest single nucleus methylome (this study), 
snRNA-seq and snATAC-seq technologies. These 
methods perform best on tissue of the high- 
est quality, prepared using methods optimized, 
which involved short postmortem interval 
(PMI) targeting <12 hours, highly consistent 
tissue slabbing, and photo documentation to 
ensure anatomically precise sampling, freez- 
ing with supercooled isopentane to preserve 
tissue integrity, and proper storage under 
vacuum at -80°C freezers. In addition to low 
PMI, stringent exclusionary criteria were ap- 
plied for RNA integrity (>7.0), infectious dis- 
eases, head trauma, intubation, neuropathology, 
and manner of death. 

The availability of tissues was a challenge 
given the highly stringent exclusionary crite- 
ria. Brain specimens meeting these criteria, 
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and for which whole brain hemispheres could 
be obtained for the current study, were quite 
rare and with a heavy male bias. Between 2018 
and 2022, 16 donors met these criteria, with 
only three female donors who ultimately were 
excluded based on QC or other exclusionary 
criteria. The three donors passing all exclu- 
sionary and QC criteria were all males. 


Human postmortem tissue specimen processing 


De-identified adult postmortem human brain 
tissue was obtained after receiving permission 
from the deceased’s next of kin. Tissue collec- 
tion was performed per the United States Uni- 
form Anatomical Gift Act of 2006, described in 
the California Health and Safety Code section 
7150 (effective 1/1/2008) and other applicable 
state and federal laws and regulations. In ad- 
dition, the Western Institutional Review Board 
reviewed tissue collection procedures and de- 
termined that they did not constitute human 
subjects research requiring institutional review 
board review. 

Male donors 18 to 68 years of age with no 
known history of neuropsychiatric or neuro- 
logical conditions were considered for inclusion 
in the study. Routine serological screening 
for infectious diseases (HIV, hepatitis B, and 
hepatitis C) was conducted using donor blood 
samples, and donors testing positive for infec- 
tious diseases were excluded from the study. 
Specimens were screened for RNA quality, and 
samples with average RNA integrity values 
>7.0 were considered for inclusion in the study. 
Postmortem brain specimens were processed 
as previously described (17) (dx.doi.org/10.17504/ 
protocols.io.bf4ajqse). Briefly, coronal brain slabs 
were cut at I-cm intervals, photographed, frozen 
in dry-ice cooled isopentane, and transferred 
to vacuum-sealed bags for storage at -80°C 
until the time of further use. For the dissection 
of brain regions of interest, photos of tissue 
slabs were annotated by a neuroanatomist to 
outline regions to target for dissections. Then, 
tissue slabs were removed from the -80°C freezer 
and briefly transferred to -20°C, where they 
were held for ~1 to 3 hours to allow tissues 
to equilibrate to -20°C. Tissues were then 
transferred to a custom temperature-controlled 
cold table held at -20°C and the region of in- 
terest was removed using standard razor 
blades or scalpels. Tissue blocks were 
stored at -80°C in vacuum-sealed bags until 
later use. 


Nuclei isolation and FANS 


Nucleus isolation was conducted using a stan- 
dard protocol as previously described (dx.doi. 
org/10.17504/protocols.io.y6rfzd6). Gating on 
DAPI and NeuN fluorescence intensity was as 
described previously (17). NeuN-positive and 
NeuN-negative nuclei were sorted into sepa- 
rate tubes and were pooled at a defined ratio 
of 90% NeuN-positive and 10% NeuN-negative 
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nuclei after sorting. Sorted samples were 
centrifuged, frozen in a solution of 1x phosphate- 
buffered saline (PBS), 1% bovine serum albumin 
(BSA), 10% dimethylsulfoxide (DMSO), and 0.5% 
RNAsin Plus RNase inhibitor (Promega, N2611), 
and stored at -80°C until further processing. 
The presorted nuclei pellets were defrosted 
and resuspended in Dulbecco’s PBS (DPBS) 
+ 1% BSA, centrifuged, resuspended back in 
1 ml of DPBS, and sorted into 384-well plates. 
Nuclei from donors H19.30.001 and H19.30.002 
were prepared and sorted into 384-well plates. 
For donor H19.30.004, frozen tissue blocks 
received from AIBS were processed follow- 
ing procedures previously described (7). Nu- 
clei were labeled for NeuN fluorescence and 
sorted into 384-well plates as described previ- 
ously (1). 


Library preparation and Illumina sequencing 
snmC-seg library preparation 


snmC-seq3 libraries were prepared using an 
updated version of snmC-seq2. In brief, sam- 
ples underwent bisulfite conversion and were 
barcoded with random primers. Samples were 
then pooled through two SPRI cleanups to 
compress 16 x 384-well plates into 1 x 96-well 
plates. Pooled samples were then adapted and 
amplified as previously described. Next, libra- 
ries were pooled and cleaned through two 
more SPRI cleanups. Finally, library concentra- 
tions were determined by Qubit and normal- 
ized for sequencing. snmC-seq3 and snm3C-seq 
(see below) libraries generated from human 
brain tissues were sequenced using an IIlu- 
mina NovaSeq 6000 instrument with S4 flow- 
cells and 150-bp paired-end mode. 


snm3C-seg library preparation 


For some samples from donors H19.30.001 and 
H19.30.002, presorted nuclei were used. The 
presorted nuclei pellets were defrosted and 
resuspended in DPBS+1%BSA, centrifuged, and 
resuspended back in 1 ml of DPBS. For the 
remaining samples of donors H19.30.001 and 
H19.30.002 and all samples from donor 
H19.30.004, frozen tissue was pulverized 
using a mortar and pestle. All samples were 
then immediately cross-linked with 2% form- 
aldehyde in solution for 5 min, quenched with 
0.2 M glycine for 5 min, centrifuged, washed 
with DPBS, and stored at -80°C until ready for 
further processing. Next, nuclei were con- 
ditioned and digested using an Arima kit 
adapted for snm3C-seq for 1 hour at 37°C, and 
20 min at 65°C to inactivate enzymes, then 
ligated for 15 min at room temperature. 
Finally, nuclei were resuspended in 1 ml of 
DPBS + 1% BSA, filtered through a 40 uM 
filter, and sorted similarly to the snmC-seq3 
samples. 

The detailed protocols for snmC/snm3C-seq 
were described previously (dx.doi.org/10.5281/ 
zenodo.8319891). 


Tian et al., Science 382, eadf5357 (2023) 


Donor-specific genomes 

gDNA Library prep protocol 

Genomic DNA was extracted from ground, 
frozen tissue using the DNeasy Blood and Tis- 
sue Kit (Qiagen, Valencia, CA). One microgram 
of DNA was fragmented with a Covaris S2 
(Covaris, Woburn, MA) to 300 bp, followed 
by end repair (Lucigen) and the addition of 
a 3’ A base (New England Biolabs). Cytosine- 
methylated adapters provided by Illumina 
(lumina, SanDiego, CA) were ligated to the 
sonicated DNA at 16°C for 16 hours with T4 
DNA ligase (New England Biolabs). Adapter- 
ligated DNA was isolated by two rounds of 
purification with AMPure X P beads (Beckman 
Coulter Genomics, Danvers, MA). The adapter- 
ligated DNA molecules were enriched by four 
cycles of polymerase chain reaction (PCR) with 
the following reaction composition: 25 ul of 
Kapa HiFi Hotstart (KapaBiosystems, Woburn, 
MA) and 5 ul of TruSeq PCR Primer Mix (II- 
lumina) (50 ul final). The thermocycling pa- 
rameters were: 95°C for 2 min, 98°C for 30 sec, 
and then four cycles of 98°C for 15 s, 60°C for 
30 s, and 72°C for 1 min, ending with one 72°C 
for 5 min step. The reaction products were 
purified using AMPure X P beads. The purified 
PCR reactions of the adapter ligation resulted 
in a library used for subsequent sequencing 
in NovaSeq 6000. 


Variant calling from donor 
genome sequencing 


Whole genome sequencing reads were first 
QCed with the software fastp (v0.20.1) (65). 
The command line used is “fastp -i input_ 
PE_R1fastq.gz -I input_PE_R2.fastq.gz -o output_ 
PE_R1fastq.gz -O output_PE_R2.fastq.gz -w 4”. 
The QCed reads were then mapped to human 
genome assembly GRCh38 (hg38) using the 
software BWA (v0.7.17) (66) with the BWA-MEM 
algorithm with mapping results stored in bam 
format through the software samtools (v1.10) 
(67). Specifically, the command line used for 
mapping is “bwa mem -t 20 hg38-ref input_PE_ 
R1.fastq.gz input_PE_R2.fastq.gz | samtools 
view -Sb - > output.bam”.The mapped reads 
were analyzed with the germline short vari- 
ant discovery workflow of the Genome Anal- 
ysis Toolkit (GATK, v4.1.8.1) (68). Briefly, we 
first removed the duplicated reads from the 
mapped reads, which then went through a base 
quality score recalibration step (BQSR) to gen- 
erate analysis-ready reads. The variant refer- 
ences used in the BQSR step were dbSNP138, 
Mills and 1000 Genomes gold standard indels, 
and 1000 Genomes phase 1 single nucleotide 
polymorphisms (SNPs). Next, candidate var- 
iants (SNPs+InDels) were called with the 
HaplotypeCaller of GATK from the analysis- 
ready reads and further filtered with a variant 
quality score recalibration step (VQSR) to de- 
termine the high-confidence SNPs and InDels, 
respectively. The variant references used to 
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recalibrate SNP quality scores were Hapmap 
3.3, OMNI 2.5, 1000 Genomes phase 1 and 
dbSNP138, and of InDels were Mills and 
1000 Genomes gold standard indels and dbSNP138. 
All the references used in the BQSR and VQSR 
were downloaded from the GATK resource bun- 
dle (https://console.cloud.google.com/storage/ 
browser/genomics-public-data/resources/ 
broad/hg38/vo0). 


Donor-specific reference genome 


For each donor, we selected the high-confidence 
homozygous SNPs using the function Select- 
Variants of GATK, and created donor-specific 
reference genomes by substituting the homo- 
zygzous SNPs into the hg38 FASTA file using the 
function FastaAlternateReferenceMaker of GATK 


Common homozygous SNPs 


By comparing the homozygous SNPs of donors, 
we constructed a list of common SNPs shared 
among the three donors. 


Mapping and count/feature matrix generation 


For sequence read mapping of both snmC- 
seq3 and snm3C-seq datasets, we used our own 
custom pipeline (https://github.com/lhqging/ 
cemba_data, version 1.2.1.dev94+2¢c65e173). 
The main steps of this pipeline included: (i) 
demultiplexing FASTQ files into single-cell; 
(ii) reads-level quality control (QC); (iii) map- 
ping; (iv) BAM file processing and QC, and (v) 
final molecular profile generation. The details 
of the five steps were previously described (J0). 
We mapped all of the reads to the donor- 
specific genomes. After mapping, we calcu- 
lated the methylcytosine counts and total 
cytosine counts for two sets of genomic fea- 
tures in each cell. Nonoverlapping chromosome 
100-kb bins of the hg38 genome (generated by 
“bedtools makewindows -w 100000”), were 
used for clustering analysis, and the genes 
defined by the human GENCODE v33 were 
used for cluster annotation and integration 
with datasets. Both CG and CH methylation 
levels of the features were normalized as pre- 
viously described (1). The cell-by-feature matrices 
were generated from normalized methylation 
levels of each feature set. 


Quality control measures 


The sequenced cells were filtered based on the 
following metrics: (1) mCCC% < 0.06; (ii) 
global mCG% > 0.5; (iii) global mCH% < 0.15; (iv) 
total final reads > 250,000; and (v) mapping 
rate > 0.5. For cells profiled with snm3C-seq, 
we required a cell to have > 50,000 cis contacts 
with a distance >2500 bp. 


Clustering and annotation of snmC-seq3 data 
Clustering analysis 


CG- and CH-methylation levels of 100-kb ge- 
nomic bins were used as input features for 
clustering. We performed clustering analysis 
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iteratively using the software package ALLCools 
(https://github.com/Ihging/ALLCools). In each 
iteration, the 100-kb bins were first filtered by 
removing bins with mean total cytosine base 
calls <250 or >3000. Those that overlapped 
with the ENCODE blacklist (69) were also 
excluded from the clustering analysis. The 
top 5000 highly variable features (HVFs) were 
then selected separately from both CG and 
CH methylation using support vector regression 
(SVR). We then applied principle component 
analysis (PCA) to each 5000 features to re- 
duce dimension. The top 7 principle compo- 
nents (PCs) were selected for each methylation 
type until there was no significant difference 
between the distributions of n-th and (n+1)-th 
PCs by two-sample Kolmogorov-Smirnov test 
with the criteria as the adjusted P < 0.1. We 
further performed preclustering for each top 
PC set and selected the PCs that were en- 
riched in preclusters (adjusted P values < 
0.05). Finally, the selected PCs from both CG 
and CH methylation PCs were concatenated 
for further analysis. We used Harmony (70) on 
the selected PCs to eliminate individual differ- 
ences. The Harmonized features were further 
fed into the consensus clustering procedures 
previously described (J). 


Doublet/debris identification 


The read number of each cell in one plate was 
stable in both snmC-seqg3 and snm3C-seq. 
Therefore, we adopted a doublet/debris detec- 
tion strategy based on cell relative reads to its 
plate. We first normalized the read number 
per cell to the mean reads of its plate. The cells 
with plate-relative-read numbers >1.2 or <0.8 
were considered doublet/debris candidates. 
After each iteration of clustering, clusters 
would be labeled as doublet/debris if the 
cluster contained >80% doublet/debris candi- 
dates and were eliminated from further analysis. 


Cell-type annotation 


The clusters were manually annotated as major 
or subtypes according to their hypomethylated 
genes, which were either canonical brain cell- 
type markers or determined de novo from the 
current dataset. We required each cell type to 
have at least five DMGs in CG and CH methyl- 
ation compared with the other cell types. Other- 
wise, it would be merged with the closest cluster. 
A candidate cell type would be labeled as an 
outlier if all its cells were from a single donor. 

In major type level, where possible, cell 
types were annotated using the nomenclature 
for known brain cell types previously described 
in the literature [e.g., (13)]; otherwise, cell clus- 
ters were annotated according to either the 
regional composition or distinct marker genes 
of the cell type. One caveat of the former ap- 
proach is that cell types annotated using 
marker genes defined in rodents or nonhuman 
primates might not reflect the corresponding 
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gene activity in human cell types. For example, 
the gene SNCG is lowly expressed in the hu- 
man major type corresponding to mouse Sncg 
cells. Nevertheless, using common nomencla- 
ture aids in cross-species comparison and ex- 
isting knowledge transferring. 


Clustering and annotation of snm3C-seq data 


To annotate cells from snm3C-seq, we com- 
bined them with the annotated snmC-seq3 
cells and performed an iterative clustering 
analysis similar to what was described above. 
The only difference was that batch effects from 
both individuals and sequencing technolo- 
gies were corrected using the software Scanor- 
ama (77). After each clustering iteration, the 
cell-type annotations were transferred from 
snmC-seq3 cells to snm3C-seq cells with a K 
nearest-neighbor (KNN) classifier. 


Robust dendrogram of cell types 


We resampled a certain number of cells from 
each cell type without replacement to compute 
the average methylome profile for the cell type 
with genome features of 100-kb bins of both 
CG and CH methylation. The resampling num- 
ber was 800 for major types and 500 for sub- 
types. The average profiles were then used to 
compute the pairwise correlation distances. This 
process was repeated 500 times to compute an 
average pairwise distance matrix, which was 
then used to construct the final cell-type den- 
drogram through hierarchical clustering with 
average linkage. 


Determine DMGs 


We determined the DMGs pairwise between 
cell subtypes for CG and CH methylation 
separately. To avoid potential bias caused by 
an imbalance of cell numbers of cell types, we 
downsampled cells in each cell subtype to no 
more than 500. All the protein-coding and 
long noncoding RNA genes (IncRNAs) defined 
by the human GENCODE v23 were tested for 
significant methylation decrease (or hypometh- 
ylation) using the Wilcoxon rank-sum test. 
The P values were adjusted with multitest cor- 
rection using the Benjamini-Hochberg (BH) 
procedure. We computed the area under the 
receiver operating characteristic (AUROC) 
curve for the candidate genes. The genes with 
adjusted P values < 0.001 and AUROC = 0.8 
were considered pairwise DMGs in CG and 
CH methylation. 


Determine DMRs 


We merged single-cell DNA methylation pro- 
files into the cell type (major type/subtype) 
profiles according to their cluster annotation 
in both donor-aggregated and donor-separated 
ways. Noncommon homozygous SNP CpG sites 
of these methylation profiles were filtered out 
before further analysis. We then used the 
DMRfind function of the software MethylPy 
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[v1.4.2; (72)] to determine the mCG DMRs 
across all cell types of the donor-aggregated 
profiles. The command line used is “methylpy 
DMRfind-output-prefix OUTPUT_FILE_- 
NAME-samples SAMPLE _NAMES-mc-type 
CGN-dmr-max-dist 250-sig-cutoff 0.01-allc- 
files MC_FILES”. We further merged the suc- 
cessive DMRs if their distance is within 250bp 
and the Pearson correlation of their mCG 
fractions across 188 subtypes is greater than 
0.8. We further screened each DMR by evaluating 
the reproducibility of the methylation pattern 
across cell types between donor-aggregated 
and -separated profiles. The evaluation criteria 
were that the Pearson’s correlation coefficient 
between the mCG fractions across cell types 
was =0.5 and the mean-absolute-error was <0.1. 

Each reproducible DMR was then assigned 
as hypo- or hyper-DMRs in each cell type based 
on the difference of its mCG fraction from its 
robust mean. The robust-mean m of each 
DMR was calculated by averaging the mCG 
fractions between 25th and 75th percentiles 
across cell types. The DMRs with mCG frac- 
tions greater than m + 0.3 were assigned as 
the hyper-DMRs in each cell type, and lower 
than m - 0.3 were assigned as hypo-DMRs. 
DMRs containing only one CG site or with- 
out any hypo- or hyper-DMR assignment were 
excluded from further analyses. 


Motif enrichment analysis 


A total of 746 TF-binding profiles (motif) from 
JASPAR2020 (73) were used to perform the 
motif enrichment analysis. Cell-type-specific 
hypo-DMRs were first segmented into 500-bp 
bins and then annotated with each motif by 
intersecting with the genome locations of the 
motifs. Motif genome locations were down- 
loaded from http://expdata.cmmt.ubc.ca/ 
JASPAR/downloads/UCSC_tracks/2020/hg38. 
To test for motif enrichment in the major cell 
types, hypo-DMRs were used as foreground 
signals, and the hypo-DMRs of all other major 
types were used as background. For testing at 
the cell subtype level, hypo-DMRs in only the 
other subtypes that belong to the same major 
type were used as background. The one-sided 
Fisher’s exact test was used to calculate the 
P values of enrichment of the foreground against 
the background. 


Integration among different single-cell datasets 
Feature matrices for human single-cell DNA 
methylation, expression, and open chromatin 


CG and CH cell-type marker genes determined 
from the mC dataset for both major and sub- 
type levels were used as the features for in- 
tegration analysis. When integrating with 
single-cell RNA sequencing (ScRNA-seq) [see 
companion manuscript Siletti et al. (11)] or 
snATAC-seq datasets [see companion man- 
uscript Li et al. (12)], we used the opposite 
values of gene body methylation because they 
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generally are strongly anti-correlated with 
gene expression. Both scRNA-seq and snA- 
TAC-seq datasets were normalized by the 
averaged total unique molecular identifier 
(UMI) counts of the featured genes and then 
transformed by log(v#+1). The neuronal cell 
types and non-neuronal cell types were inte- 
grated separately. CH methylation was used for 
neuronal cell types, and CG methylation was 
used for non-neuronal cell types. An additional 
filtering step was applied before integrating 
non-neuronal cell types, which required the 
total UMI of the featured genes of each cell to 
be >3000 for both the scRNA-seq and snATAC- 
seq datasets. 


Feature matrices for human and mouse 
single-cell DNA methylation 


We used only homologous genes between hu- 
man and mouse to perform the integration 
analysis. The list of homologous genes was 
downloaded from the Mouse Genome Informa- 
tics (MGI) database (https://www.informatics. 
jax.org/homology.shtm1). The homologous 
genes were selected from the same features 
used when integrating with scRNA-seq and 
snATAC-seq datasets. Human brain cells from 
THM, MB, CB, PN, and entorhinal cortices 
were excluded because no counterparts exist 
from the public mouse dataset (7). The mouse 
dataset was reannotated in the same way as 
the human dataset. CG methylation was used 
to integrate neuronal and non-neuronal cell 
types separately. 


Method to integrate different single-cell 
sequencing datasets 


After feature matrix generation, we used a 
three-step method analogous to Seurat v3 to 
project two datasets XY and Y onto the same 
space: (i) using canonical correlation analysis 
to capture the shared variance across cells be- 
tween datasets; (ii) finding anchors as five 
mutual nearest neighbors between the two 
datasets; and (iii) pulling the two datasets into 
the same space. To allow the scalability, we 
randomly selected 20,000 cells from each 
dataset (X,.¢ and Y,<,) as a reference to fit the 
canonical correlation analysis, and transform 
the other cells (X,,., and Y,,.,) onto the same 
CC space. Specifically, the canonical correla- 
tion vectors (CCV) of X,,.,and Y,.¢ (denoted as 
U;er and V,.f) are computed by singular value 
decomposition on their dot product, U, po = 
Xref Voop, Where Up-Ure =I and VyepVieg = I. 
Then, the CCV of X,,., and Yg,., (denoted as 
Ugry and Vg,,) are computed by Ugr = Xory 
(Viet Vre) /S and Very = Yory(Xpor Unep). U and 
V were normalized by dividing the L2 norm of 
each row and used to find mutual nearest 
neighbor anchors and score anchors using 
the same method as Seurat v3._.X and Y were 
also combined vertically and the PCs of this 
combined matrix were integrated together 


Tian et al., Science 382, eadf5357 (2023) 


using the same method as Seurat v3 through 
the anchors generated from the previous step. 
This integration step projects the PCs of one 
dataset (query) to the PCs of the other dataset 
(reference) while keeping the PCs of the refer- 
ence dataset unchanged. The resulting PCs were 
used for visualization and finding matched 
clusters between datasets. 


3D genome analysis 


The 3D genome features were analyzed at both 
the single-cell and pseudobulk level. For single- 
cell analysis, we used scHiCluster (75) to impute 
the contact matrices at 100-kb resolution with 
pad = 1, 25-kb resolution with pad = 1 for con- 
tact within 10.05 Mb, and 10-kb resolution 
with pad = 2 for contact within 5.05 Mb. To 
speed up the imputation at 10-kb resolution, 
the convolution and random walk were per- 
formed within each 30-Mb sliding window 
across each chromosome with a step size of 
10 Mb. Only the values within the 10 Mb in the 
center of the sliding window were used as the 
final result. 

For pseudobulk analysis, we merged the 
cells from each group (major type, subtype, 
or region) by taking the sum of raw matrices 
or the average of imputed matrices over cells 
within the group, and only randomly selected 
1500 cells to use if the group contained >1500 cells. 
We also had a group with merged all cell types 
used in the compartment analysis and embed- 
ding comparison. This group contained 5707 cells 
in total, generated by randomly selected 
200 cells with =100,000 contacts from each 
major type, except for L5-ET, for which only 
107 cells were identified and all were used in 
the analysis. The details of methods are de- 
scribed below. 


Contact distance distribution analysis 


We generated a histogram of contacts for each 
single cell based on the distance between the 
two anchors of the contact. The bins were 
equally divided on the log2 distance scale, 
with a step size of 0.125, ranging from 2500 bp 
to 249 Mb (length of the longest chromosome). 
The 2-th bin is the number of contacts with a 
distance between 2500 x 2°!”°* and 2500 x 
9°25) Tn Fig. 2B and fig. S5, the short-long 
ratio was defined as the proportion of contacts 
in 51st (200 k) to 76th (2 M) bins divided by the 
proportion of contacts in 103rd (20 M) to 114th 
(50 M) bins. 


Compartment analysis 


Pseudo-bulk contact matrices of each chromo- 
some at 100-kb resolution were used for com- 
partment analysis. We first used the merged 
contact maps of the 5707 cells and filtered out 
the 100-kb bins with abnormal coverage. Spe- 
cifically, the coverage of bin 2 on chromosome 
c (denoted as R,;) was defined as the sum of 
the i-th row of the contact matrix of chromo- 
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some c. We only kept the bins with coverage 
between the 99th percentile of R, and twice 
the median of R, minus the 99th percentile of 
R,. Contact matrices were normalized by dis- 
tance, and Pearson’s correlation matrices of 
the normalized matrices were computed. These 
merged contact maps were used to fit the PCA 
models per chromosome. The first PC (PC1) 
were used as compartment scores, and the 
sign of the model was adjusted to ensure the 
compartment with higher CpG density had 
positive scores. We visually inspected PC1 of 
the merged matrices to ensure the values cor- 
respond to the plaid pattern of the correlation 
matrix rather than chromosome arms. The 
contact maps of each major type were filtered 
and converted to the correlation matrices in 
the same way as described above and were 
then transformed with the PCA models. Both 
raw matrices and imputed matrices were used 
for this analysis. We used the merged raw 
matrices for fitting the PCA model and trans- 
formed the correlation matrices of raw mat- 
rices in each cell type as raw compartment 
scores, and transformed the correlation matri- 
ces of imputed matrices in each cell type as 
imputed compartment scores. In general, the 
imputed matrices work better with smaller 
cell populations, whereas the raw matrices 
provide higher resolution when enough cells 
are merged. 

To determine whether the enriched longer- 
range interactions were intercompartmental 
or intracompartmental, we stratified the con- 
tact distance plot by the difference or summa- 
tion of compartment scores at the contact 
anchors. The difference of the scores reflected 
whether the contact was intercompartmental 
or intracompartmental, and the larger differ- 
ence represented intercompartmental. The 
summation of the scores distinguished wheth- 
er the contact was AA or BB for intracom- 
partmental contacts, and the large positive 
summation represents the AA interaction, 
whereas the small negative summation repre- 
sents the BB interaction. Note that longer- 
range contained more intercompartmental 
interactions than shorter-range in general, so 
non-neuronal cells which had more longer- 
range interactions also had more intercompart- 
mental contacts than neurons when counting 
the raw contact counts. The results we reported 
in fig. S5, F to K, used distance normalized 
contact counts, which reflected a relative pro- 
portion of intracompartmental or intercompart- 
mental contacts at each distance. Therefore, 
our results only suggested that a higher pro- 
portion of longer-range contacts in non-neuronal 
cells were intracompartmental, which did not 
indicate that there were more intracompart- 
mental contacts in total in non-neuronal cells. 

Saddle plots and compartment strengths 
were computed in the same way as described 
in (74). Specifically, within each chromosome, 
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we ranked all the 100-kb bins based on com- 
partment scores, and grouped the bins into 50 
equal-interval groups. The distance-normal- 
ized interaction strength between each pair of 
bins or the PCC of mCG or mCH levels 
between each pair of bins were averaged 
within each group. The axes were ranked by 
the compartment score of the cell types so that 
BB interactions are on the top left and AA 
interactions are on the bottom right. DCs were 
identified with dcHiC (75) between all major 
types or neuronal major types using the raw 
compartment scores as input. A large propor- 
tion (>60%) of bins were identified as differ- 
ential with a traditional g-value threshold of 
0.01. Therefore, we only selected the top DCs 
with a Z-score transformed Mahalanobis dis- 
tance >1.960 (97.5 percentile of standard nor- 
mal distribution). 


Identification of domains and differential 
domain boundaries 


Domains and insulation scores were derived 
with scHiCluster at 25-kb resolution. Specif- 
ically, domains were identified within each 
single cell with TopDom (76) on the imputed 
matrices at 25-kb resolution. Insulation scores 
were computed in each cell group (major type 
or major type within a brain region) for each 
bin with the pseudo-bulk-imputed matrices 
(average over single cells) and a window size 
of 10 bins. The boundary probability of a bin is 
defined as the proportion of cells having the 
bin called as a domain boundary among the 
total number of cells from the group. 

The number of domains identified in single 
cells is correlated with the number of short- 
range reads that could affect the performance 
of imputation. To avoid computational arti- 
facts, we selected the cells from each cell type 
to match the distribution of short-range con- 
tacts across cell types and observed the same 
trend in fig. S6, D and E, which suggests these 
differences between domain numbers and 
sizes are not completely explained by the 
different short/long ratios between neurons 
and non-neurons. 

To identify differential domain boundaries 
between n cell groups, we derived an n x 2 
contingency table for each 25-kb bin, where 
the values in each row represent the number 
of cells from the group that has the bin called 
as a boundary or not as a boundary. We com- 
puted the chi-square statistic and P value of 
each bin and used the peaks of the statistics 
across the genome as differential bounda- 
ries. The peaks are defined as a local maxi- 
mum of chi-square statistics within FDR <1 x 
10° (Benjamini and Hochberg procedure). If 
two peaks were within five bins of each other, 
we only kept the peak with a higher chi-square 
statistic. We also required the peaks to have a 
Z-score transformed chi-square statistic >1.960 
(97.5 percentile of standard normal distribu- 
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tion), and differences between maximum and 
minimum boundary probability >0.05. 


Identification of chromatin loops and differential loops 


Chromatin loops were identified with scHiCluster 
(15) in each major type, subtype, and major type 
within each brain region, respectively. We only 
perform loop calling between 50 kb and 5 Mb, 
given that increasing the distance only leads to 
a limited increase in the number of significant 
loops. For each single cell, the imputed matrix 
of each chromosome Qyey Was log-transformed, 
and Z-score normalized at each diagonal (re- 
sult denoted as E,.1) and subtracted a local 
background between =30 kb and <50 kb (re- 
sult denoted as 7,1), similar to SnapHiC (77). 
A pseudo-bulk-level ¢ statistic was computed 
to quantify the deviation of E and 7 from 0 
across single cells from the cell group, where 
larger deviations represent higher enrichment 
against global (£) or local (T) background. Eey 
is also shuffled across each diagonal to gen- 
erate Egnutfecey, and then Tnurecer, to estimate 
a background of the ¢ statistics. An empirical 
FDR can be derived by comparing the ft sta- 
tistics of observed cells versus shuffled cells. We 
required the pixels to have an average E > 0, 
fold change >1.33 against donut and bottom 
left backgrounds, fold change >1.2 against 
horizontal and vertical backgrounds (77), and 
FDR <0.01 compared with global (£) and local 
(T) backgrounds. The loop summits were se- 
lected from the loop pixels with a breadth-first 
search algorithm, where we started from the 
loop pixels with the largest E, and connected it 
with all the other loop pixels within 20 kb (LO 
distance) with smaller E values. The loop pixel 
with the largest E value in each connected 
component of loop pixels was defined as a 
loop summit. We only used the concept of 
summit during the counting of loop summits, 
and in all other cases, we used “loop” to rep- 
resent loop pixels. 

To compare the interaction strength of 
loops between different groups of cells, we 
adopt an analysis of variance (ANOVA) frame- 
work to compute the F statistics for each loop 
identified in at least one cell group using either 
Qeen (result denoted as Fa) or Teen (result 
denoted as Fy). Then, we Z-scored Fg and Fr 
across all the loops being tested and selected 
the ones with Fg and Fr > 1.036 (85th percen- 
tile of standard normal distribution) as dif- 
ferential loops. The threshold was decided by 
visually inspecting the contact maps as well as 
the correlation of interaction and loop anchor 
CG methylation. 

Motif enrichment analysis between differ- 
ential loops and constant loops was carried 
out after controlling the interaction strength 
and the enrichment against the local back- 
ground. We first generated a pool of constant 
loops whose Z-scored Fg and Fy < 0. Then we 
grouped the differential and constant loops 
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into 100 x 100 groups based on Fg and Fr. 
We selected the same number of loops from 
each group for differential and constant loops 
and compared the motif enrichment in the 
differential loops or constant loops compared 
with the union of them. 

Differential loops were identified in 11 com- 
parisons, including between all major types; 
between neuronal major types; between neu- 
ron, glia (ASC, ODC, and OPC), MGC, PC, EC, 
and VLMC; between glial major types; be- 
tween excitatory neurons (L2/3-IT, L4-IT, L5- 
IT, L6-IT, L6-IT-Car3, L5/6-NP, L6-CT, L6b, 
L5-ET, and Amy-Exc), inhibitory neurons 
(Lamp5, Lamp5-Lhx6, Sncg, Vip, Pvalb, Pvalb- 
ChC, Sst, and Chd7), cerebral nucleus neurons 
(MSN-D1, MSN-D2, and Foxp2), and SubCtx- 
Cplx; between excitatory major types; between 
inhibitory major types; between cerebral nu- 
cleus major types; between IT major types 
(L2/3-IT, L4-IT, L5-IT, and L6-IT); between 
caudal ganglionic eminence (CGE)-derived 
major types (Lamp5, Lamp5-Lhx6, Sncg, Vip); 
and between medium spiny neuron major 
types (MSN-D1 and MSN-D2). The results can 
be found in table S6. Aggregate peak analysis 
of some of the comparisons is shown in fig. S8. 
For each single loop pixel, the imputed contact 
map from -100 kb to +100 kb was selected and 
min-max normalized to the range of 0 to 1, 
and averaged across all the differential loops 
that have a folder change of Q and T greater 
than 1.2 and 1.5, respectively, comparing the 
average of foreground cell types and the aver- 
age of background cell types. 


Single-cell embedding based on different 3D 
genome features 
Contact based 


The imputed contacts at 100-kb resolution 
with distance =100kb and <1 Mb are used as 
features for singular value decomposition di- 
mension reduction. The first 30 principal com- 
ponents were normalized by singular values 
and L2 norms per cell and then used for 
t-distributed stochastic neighbor embedding 
(t-SNE) visualization in Fig. 1G and fig. S2F. To 
better visualize the heterogeneity of neuronal 
cells, we downsampled each of the neuronal 
cell populations to 1000 cells and, together 
with all the neurons to fit the t-SNE, and proj- 
ect back the other non-neuronal cells. Higashi 
(20) and fastHigashi (27) were used in the 
comparison of embedding in fig. S7. Because 
of the memory limitation (256 G), we only ran 
these tools at 500-kb resolution rather than 
100 kb (same as suggested in their papers and 
Github pages). The donor information was used 
as a confounding factor in the models to avoid 
the embedding being driven by donor differences. 


Compartment based 


Single-cell compartment scores were computed 
using either the CpG density method or the 
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eigenvector method on raw contact maps or 
contact maps imputed by scHiCluster or Higashi. 
The CpG method used the CpG density of each 
100-kb bins across the genome, and a value for 
each bin was computed for each single cell as 
the average of CpG density of other 100-kb bins 
on the same chromosome, weighted by the in- 
teraction strength between the two bins. The 
eigenvector method used an average of im- 
puted matrices across all single cells to com- 
pute the correlation matrices (as described 
in the compartment analysis) and fit PCA 
models to transform the correlation matrices 
of all single cells. Higashi was also used to im- 
pute the contact maps after generating the cell 
embedding as described in the “contact based” 
section above, using either O neighbors or 
5 neighbors on the embedding space to help 
imputation. Although using 5 neighbors gen- 
erated compartments with higher cell-type spe- 
cificity, this method enforced the smoothing 
of information on the cell embedding, which 
could artificially augment the difference be- 
tween imputed matrices when the embedding 
can separate the cell types well. Therefore, this 
makes it challenging to claim if the separation 
of cell types is due to the intrinsic heteroge- 
neity of compartments across single cells or 
due to the smoothing of the embedding and we 
still show the result with O neighbors in fig. $7. 
We then performed singular value decomposi- 
tion on the cell-by-bin compartment score matrix 
X = USV’ to derive the cell embedding U. 


Loop based 


We combined the loop pixels identified in all 
major types to make a meta loop list and gen- 
erated a binary cell-by-loop matrix where each 
element indicated whether a contact was de- 
tected in the cell at the loop pixel. Latent se- 
mantic analysis with log term frequency was 
applied to the binary matrix (denoted as A) 
to compute the embedding. Specifically, we 
selected the columns having 1 in more than 


5 rows, and then computed the column sum of 
tcell 


the matrix (colsum; = Au and kept 
I= 


only the bins with Z-scored logscolsum between 
-2 and 2. The filtered matrix was normalized 
by dividing the row sum of the matrix to gen- 
erate a term frequency matrix 7F’, and further 
converted to X used for singular value de- 
composition X = USV’, where Xj = log( TF; x 


100000 + 1) x log (1 et! . 


colsum,; 


We also generated a cell-by-loop matrix B 
where each element indicated the imputed 
contact at each loop pixel in each single cell. 
This is a dense matrix of 5.7 k x 3.2 M, which 
limits the ability of this method to scale up to 
all cells in our dataset. We performed eigen- 
value decomposition BB’ = USU’ and rank the 
eigenvectors by the eigenvalues from large to 
small to derive the cell embedding U. 
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Domain boundary based 

We generated a binary cell-by—25-kb bin matrix 
where each element indicated whether the 
bin was identified as a domain boundary in the 
cell. The same LSI framework was used to ob- 
tain the cell embedding. 


Clustering benchmark 


We applied L2 normalization within each cell 
on the top dimensions for all embeddings. For 
t-SNE visualization, we used top 25 dimen- 
sions, except in Higashi and fastHigashi we 
used top 128 dimensions. K-means was used to 
perform clustering, and the top 50 dimensions 
were used, except for Higashi and fastHigashi, 
128 dimensions were used. k was enumerated 
from 3 to 12 and the result with the highest 
adjusted rand index (ARI) compared with the 
cluster labels was shown in fig. S7. To benchmark 
the ability to separate excitatory cell types, 
we used L2/3-IT, L4-IT, L5-IT, L6-IT, L6-IT- 
Car3, L5/6-NP, L6b, L6-CT, and L5-ET. 
For inhibitory cell types, we used Lamp5- 
Lhx6, Lamp5, Sncg, Vip, Pvalb-ChC, Pvalb, 
and Sst. 

The failure to resolve cell types could be 
due to the following biological reasons, i.e., the 
differences of compartments across cell types 
were small or the heterogeneity of compart- 
ments across cells within the same cell type 
were huge, or to technical reasons, i.e., the 
power of algorithms to identify compartments 
on single-cell Hi-C data are limited. On the 
basis of the other analyses, we can identify 
the differential compartments between neu- 
ronal cell types as well as excitatory or inhib- 
itory subtypes that strongly correlate with gene 
expression. This suggests that compartment 
differences exist at the pseudobulk level be- 
tween finer-scale cell types that cannot be 
distinguished in compartment-based single- 
cell embedding. Thus, the single-cell hetero- 
geneity or the computational challenges could 
be the major determinants. The analysis of 
chromosome imaging data could help further 
distinguish the two factors given that they are 
usually considered the gold standard for chro- 
matin structures and do not need imputation 
algorithms for compartment calling. A previ- 
ous study concluded small differences be- 
tween compartments across single cells (78). 
Even though how large these differences are 
relative to the across cell-type differences re- 
main elusive and would need the chromosome/ 
genome-level chromosome imaging data 
from complex tissues to resolve. In sum- 
mary, this result is the combined effect of 
biological feature specificity and compu- 
tational limitations for quantifying the 
features accurately within single cells, and 
the conclusion is drawn from the best 
methods to date we can apply and might 
be challenged by technology and algorithm 
improvement. 
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Comparison between differential 3D genome 
structure and other modalities 

Compartment 

The raw compartment scores were quantile 
normalized across cell types. For each 100-kb 
bin, we used this normalized score to com- 
pute its PCC with the ATAC, mCG, and mCH 
signals at the same 100-kb bin across cell 
types. We also computed PCC between the nor- 
malized compartment scores with the expres- 
sion level of genes whose promoters (TSS + 
2 kb) or gene bodies (TSS - 2 kb to TES + 
2 kb) overlap with the 100-kb bin. 


Domain 


The boundary probability was defined at the 
start position of each 25-kb bin. We used the 
ATAC, mCG, and mCH signals at the upstream 
and downstream 10-kb bin of the boundary 
and took the average signal of the two bins to 
compute PCC with the boundary probability. 
We also computed PCC between the boundary 
probabilities with the expression level of genes 
whose promoters or gene bodies overlap with 
the two 10-kb bins. 


Loop 


The interaction strength was defined for each 
loop pixel between two 10-kb bins. We used 
the ATAC, mCG, and mCH signals at the two 
anchor bins of the loop and took the average 
signal of the two bins to compute PCC with the 
imputed loop strength (Q). We also computed 
PCC between the loop strength with the expres- 
sion level of genes whose promoters or gene 
bodies overlap with the two 10-kb bins, or whose 
gene bodies are between the two 10-kb bins. 

Note that the differences between compart- 
ment and domain or loop in correlation analy- 
ses could be due to the different resolution, 
given the usage of 100-kb resolution for ATAC 
and methylation could dilute the signals of reg- 
ulatory elements. Domain and loop are more 
comparable given that the quantification of 
ATAC and methylation signals are at 10-kb 
resolution for the analyses. 


DEGs and comparison with 
3D genome structures 


Because of the differences between major type 
annotation, we assigned each RNA cell a major 
type label according to the mC cell based on the 
integration of neuronal cells between scRNA- 
seq data and snmC-seq data. The non-neuronal 
cells were labeled according to their original 
annotation given the clear correspondence be- 
tween the two annotations. We randomly se- 
lected 1000 RNA cells from each major type, 
where the probability of a neuronal cell being 
chosen is proportional to the confidence of 
label transfer from mC cells to that RNA cell. 
This procedure provides 29,000 RNA cells in 
total from the 29 major types used in our 3C 
analysis. For each cluster pair, the P values 
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were derived with the Wilcoxon rank-sum test, 
and the fold-change is computed as the ratio 
between the average expression level across 
cells in the two clusters. The genes with an 
absolute value of log2-fold change >1 and FDR 
(BH procedure) values <0.01 were considered 
as differentially expressed. The top 100 DEGs 
with the smallest FDR (BH procedure) were 
used as top DEGs between the cluster pair and 
the top results from all possible pairs were 
concatenated and duplicates were removed to 
generate a final list of top DEGs. This analysis 
identified 1099 top DEGs between neuronal 
major types and 1358 DEGs between all major 
types on autosomes. 

We then calculated the PCC between 3D ge- 
nome structures and gene expression across 
neuronal major types. To avoid the bias led by 
the cutoff selection for differential analysis, we 
grouped the bins and genes based on the dif- 
ferential statistics and investigated the corre- 
lation for the bins and genes assigned to each 
group (figs. S11, E and F, and S12, E and F). For 
the expression level of each DEG, we also com- 
puted its correlation with the quantile normal- 
ized compartment scores of each 100-kb bin, 
the boundary probability of each position with 
25-kb sliding interval, or the strength of loops 
within TSS - 5 Mb to TES + 5 Mb region of the 
gene (Fig. 2, K to N). We shuffled the 3D ge- 
nome features and genes within each major 
type to calculate null PCC and estimate FDR. 
For each PCC value (denoted as x) between 
gene 7 and 3D genome feature j, a left-side 
FDR was computed as the ratio between the 
proportion of shuffled PCC smaller than x and 
the proportion of observed PCC smaller than 
x, and a right-side FDR was computed as the 
ratio between proportion of shuffled PCC 
greater than x and proportion of observed 
PCC greater than x. The PCC threshold corre- 
sponding to left-side and right-side FDR < 0.01 
was computed (denoted as tl and tr), and the 
final PCC threshold for significance was deter- 
mined as +{max[abs(¢J), abs(t7)]}. 


CRE prediction 


On the basis of the pairwise CH-DMGs deter- 
mined between cell subtypes, we assigned a 
gene as hypomethylated in one subtype if it 
was a hypomethylated DMG in at least 40 out 
of 187 pairs compared with other subtypes. A 
DMR was assigned to a subtype if it was either 
CG hypomethylated in the subtype (see “De- 
termine differentially methylated regions” sec- 
tion above) or its CG-methylation level was 
<0.3. A DMR was considered as a candidate 
CRE if it was connected by a differential loop 
to a gene that was also a DMG in the same 
subtype. We did not require the differential 
loop connecting the DMR-DMG pair to be a 
loop detected in the subtype of the pair. The 
reason for this loose criterion is threefold. 
First, the strength of the differential loop anti- 
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correlates with the methylation levels (Fig. 
2K). Ifa DMR-DMG pair is connected with a 
differential loop in one subtype, then the loop 
likely exists in another subtype with the DMR- 
DMG pair of similar methylation statuses. Sec- 
ond, CREs are usually pleiotropic (79). The 
loops detected in subtypes covered by the m3c 
dataset could be reused in another uncovered 
if the methylation status of the DMR-DMG 
pairs is similar. Finally, loops could be missed 
in detection due to either the limitation of the 
computation methods or the insufficient cov- 
erage in certain subtypes. The DNA looping 
information transferring among subtypes could 
cope with such a situation to some extent. 


Association between brain disorder risk variants 
and DMRs across cell types 


We obtained GWAS summary statistics for 
quantitative traits related to neurological dis- 
ease and control traits of intelligence (80), edu- 
cational attainment (87), alcohol usage (82), 
Alzheimer’s disease (83), bipolar disorder (84), 
attention deficit hyperactivity disorder (85), neu- 
roticism (86), schizophrenia (87), amyotrophic 
lateral sclerosis (88), tobacco use disorder (89), 
insomnia (90), sleep duration, coronary artery 
disease (91), height, tiredness (92), type 1 dia- 
betes (93), type 2 diabetes (94), allergy (95), 
birth length (96), and birth weight (97). 

We prepared summary statistics in the stan- 
dard format for linkage disequilibrium score 
regression. Next, we converted major-type hypo- 
DMRs to human genome assembly GRCh37 
(hg19) coordinates using the software Lift- 
Over, and annotated with the 1000 Genomes 
Project Phase 3 SNPs (98). The superset of the 
hypo-DMRs was used as the background. Fi- 
nally, we used cell-type-specific linkage dis- 
equilibrium score regression [https://github. 
com/bulik/ldsc; (40)] to estimate the enrich- 
ment coefficient of each annotation for each trait. 


Brain regional axes from DNA methylation profiles 


Both CG-methylated and CH-methylated highly 
variable 100-kb bins of one cell type (the same 
features for clustering analysis) were used to 
compute a lower dimensional representation 
with PCA. First, a neighbor graph of the cells 
was constructed in the PCA space. Then, for 
each cell, a regional identity vector was com- 
puted by averaging the location informa- 
tion of this cell and its neighbors. A pairwise 
Manhattan distance matrix was then con- 
structed from the regional identity vectors to 
capture relations among brain regions. The 
principle coordinate analysis was applied to this 
distance matrix to obtain a lower-dimensional 
embedding in the regional space as well as 
preserve relative distances among cells. Thus, 
the cells were transformed from the methylome 
space to the regional space.In the regional 
space, we perform the trajectory analysis with 
the elastic principal graph algorithm imple- 
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mented in STREAM (49). The parameters epg __ 
alpha, epg_mu, and epg_lambda were manu- 
ally adjusted to ensure the resulting trajecto- 
ries well represented the distributions of the 
cells in the regional space. Each cell was as- 
signed a regional index (or pseudotime) range 
in [0,1] according to its relative position to the 
trajectory. The cells were then grouped into 
20 bins along the trajectories based on their 
regional index. The mean DNA methylation 
profiles can be computed for each bin. 


Consensus regional axis for cortex and 
basal ganglia 


The mean regional index was first computed 
for cells from each cortical region in each cell 
type. Then the average regional indices were 
calculated by averaging the mean regional 
indices across the corresponding cell types. 
Finally, the consensus regional axis was con- 
structed by ranking the average indices. 


Regional DMGs 


We used a one-versus-rest strategy to calcu- 
late region-specific CH-DMGs (rDMGs) within 
major types from the cortex and basal ganglia. 
To avoid potential bias caused by an imbal- 
ance of cell numbers in different regions, we 
downsampled cells in each region to no more 
than 500. Using the Wilcoxon rank-sum test, 
protein-coding genes and IncRNAs were tested 
for significant methylation decrease (or hypo- 
methylation). The P values were adjusted with 
multitest correction using the BH procedure. 
The genes with adjusted P values < 1° and 
log2-fold change < -0.1 were considered rDMGs. 


Regional DMRs 


Cells from the same brain region were merged 
for each major type to construct the regional 
pseudo-bulk methylation profiles. Then, the 
DMRfind function of the software MethylPy 
was used to determine the candidate rDMRs 
with the same options as in determining cell- 
type DMRs. If a candidate rDMR has CG- 
methylation variation = 0.6 across regions tested, 
it was considered an rDMR. 


Regional enriched motifs of TFs 


For simplicity, we selected rDMRs with methyl- 
ation levels changing monotonically (PCC = 0.5 or 
< -0.5) with the regional axes that we revealed 
in cortex and basal ganglia (Fig. 4, C and H), 
and performed the TF motif enrichment analy- 
sis against cell-type-specific hypomethylated 
DMRs. TF motifs with an adjusted P value < 
1 x 10°°° were considered to be enriched. 


Enrichment analysis on conserved DMRs 
between human and mouse 

Functional enrichment analysis 

of hcCnsvDMRs 


The Genomic Regions Enrichment of Annota- 
tions Tool (GREAT) (99) was used to compute 
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the Gene Ontology (GO) term enrichment of 
hcCnsvDMRS. “Basal+extension” option (5.0 kb 
upstream, 1.0 kb downstream, and up to 100 kb 
max extension) was selected for gene associa- 
tion, and “curated regulatory domains” are in- 
cluded in the analysis. 


Comparison between hcCnsvDMRs 
and histone modification marks 
in mouse forebrains 


Replicated peaks of histone modification marks 
of PO mouse forebrain were downloaded from 
the Encode project (4). Particularly, H3K27ac 
(ENCFF044YBD), H3K27me3 (ENCFF461UUN), 
H3K4mel (ENCFF467MYU), H3K4me3 
(ENCFFO66LGF), and H3K9me3 (ENCFF997XJK) 
were used. The software Genomic Association 
Tester (GAT, v1.3.6) (J00) was used to com- 
pute the enrichment of hcCnsvDMRs in the 
histone modification marks. Accessibility of 
hcCnsvDMRs was determined by comparing 
them with snATAC-seq peaks profiled from 
P56 mouse brains (2). 


scMCode construction 
Candidate CpG sites 


We constructed the pseudo-bulk mCG profile 
for each major type and then iteratively se- 
lected CpG sites to distinguish all major types. 
In each iteration, CpG sites were selected ac- 
cording to the following criteria: (i) they were 
either almost entirely methylated (mCG% = 
80%) or unmethylated (mCG% < 20%) among 
all the remaining major types; (ii) both two 
methylation statuses were presented among 
the remaining major types; and (iii) the CpG 
sites should have coverage =10 in =80% of 
the remaining major types. These selected CpG 
sites were added to the CpG site pool for later 
scMCode construction. In each iteration, the 
methylation levels of the selected CpGs in the 
remaining major types were binarized if they 
are >80% or <20%. Pairwise distances were 
computed between binarized methylation sta- 
tus, and the cell types that had a distance <20 
to any of the other major types were kept for 
the next iteration of CpG selection. In total, 
221,140 CpG sites were selected as candidates 
for seMCode construction. 


CpG site selection for scMCode 


The methylation levels of candidate CpG sites 
across all the major types were trinary dis- 
cretized based on their DNAm fractions (dis- 
cretized values were -1 for mCG% < 20%, 1 for 
mCG% = 80%, and 0 for 20% < mCG% < 80%) 
in major type pseudo-bulk level. The CpG sites 
were further grouped into 38,945 features based 
on these discretized DNAm status across major 
types. To prevent scMCode from bias caused 
by cell-type population differences or indi- 
vidual variations of donors, we randomly se- 
lected 300 cells from each major type from 
each donor as the dataset for seMCode con- 
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struction. In each cell, the methylation state of 
each feature was either computed by averag- 
ing the methylation levels of all the CpG sites 
belonging to this feature (AverageCpG) or by 
directly using the methylation level of a ran- 
domly picked single CpG site belonging to this 
feature (RandomCpG). The cell-by-feature 
matrix was trinary discretized based on their 
DNAm fractions (discretized values were 
-1 for mCG% < 50%, 1 for mCG% > 50%, and O 
for mCG% = 50% or uncovered) at the single- 
cell level, and then used to train a random 
forest model to predict major types. A 4-fold 
cross-validation scheme was used to prevent 
overfitting. Finally, the top 800 most import- 
ant features were selected to construct the 
scMCode for major types. We observed no 
difference in predicting performance between 
AverageCpG and RandomCpG, indicating the 
robustness of scMCode. 


Boost the prediction accuracy with 
KNN imputation 


We achieved ~88% single-cell predicting accu- 
racy directly with the cell-by-feature matrix. 
Given the limited coverage of single-cell data, 
the cell-by-feature matrix could be further im- 
puted to improve the prediction accuracy. 
Within the training dataset, we randomly se- 
lected half of the cells and merged them into 
pseudo-cells according to their major types. 
This process was repeated 20 times, and a 
pseudo-cell-by-feature matrix was constructed 
from these pseudo-cells in the same way. A 
KNN imputer was built upon this matrix. The 
testing dataset was first imputed with the 
KNN imputer and then fed to the random for- 
est model for prediction. The KNN imputation 
step improved the prediction accuracy to ~93%. 


Cross-donor tests 


We derived scMCodes for each donor and 
trained the KNN imputer and random forest 
classifier correspondingly. The single-donor 
scMCodes and models were then applied to 
the data of other donors to assess the cross- 
individual robustness. When the training and 
testing donors are the same (diagonal in Fig. 
6E), a 4-fold cross-validation scheme was 
used to prevent overfitting and to assess the 
accuracy. 
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INTRODUCTION: The mammalian brain com- 
prises billions of neurons and glia capable of 
executing highly complex behaviors. These cells 
are organized into several major functional 
regions with distinct developmental origins. 
Although the cerebral cortex is the most well 
studied because of its role in cognition, the 
other regions are no less essential. In the past 
several years, single-cell genomic methods 
have revolutionized our understanding of the 
brain’s cellular diversity, revealing hundreds 
of transcriptomic cell types across the mouse 
brain. Prior work has shown that transcrip- 
tional cell types can be aligned with other 
modalities—e.g., electrophysiology, morphol- 
ogy, and connectivity—as well as across large 
evolutionary distances. However, the human 
brain has not been comprehensively surveyed, 
and few regions outside the cerebral cortex 
have been profiled. Thus, the overall number, 
distribution, and region-specificity of human 
neurons and glia remain unknown. 


RATIONALE: As a first step toward a brain-wide 
census of cell types, we used single-nucleus 


Superclusters 
x 100 


RNA sequencing to profile cells sampled from 
throughout the entire human brain. We iso- 
lated postmortem tissue from three donors 
and enriched for neurons from approximately 
100 locations across the forebrain (the cerebral 
cortex, hippocampus, cerebral nuclei, hypo- 
thalamus, and thalamus), midbrain, and hind- 
brain (the pons, medulla, and cerebellum). 
The final dataset comprised more than three 
million cells, including more than two million 
neurons, which we clustered iteratively into 
31 superclusters, 461 clusters, and 3313 sub- 
clusters. This top-down approach enabled us 
to examine and compare heterogeneity within 
and across cell classes and regions. 


RESULTS: Neurons varied extensively across 
brain regions. Many neuronal superclusters 
comprised cells mainly localized to specific 
brain regions. Cell states moreover broadly 
mirrored their developmental history. For ex- 
ample, several superclusters distributed across 
the telencephalon—the developmental com- 
partment that produces the cortex, hippo- 
campus, and cerebral nuclei. Cortical clusters 


Cellular diversity across the entire human brain. Approximately 100 anatomical locations were dissected 
from the human brain in three donors followed by single-nucleus RNA sequencing (left). Three levels 

of clustering revealed the cell type composition of the human brain (center). Whereas cortical neurons 
varied more gradually, brainstem neurons were unexpectedly diverse with a large number of small 
clusters indicating distinct cell types organized by combinatorial gene expression (right). Although less 
heterogeneous than neurons, glial cells, such as astrocytes and oligodendrocyte precursors, also differed 


between the cortex and the brainstem (right). 
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comprised layer-specific excitatory neuror Ser 


well as distinct inhibitory interneurons 
distinct developmental origins. Other super- 
clusters reflected cellular migration during 
development, including midbrain-derived in- 
hibitory neurons located in the thalamus, 
which transcriptionally aligned with midbrain 
neurons. 

Within regions, neuronal subtypes were not 
distributed according to simple rules. Dissec- 
tions differed from one another according to 
both specific cell types and cell type propor- 
tions. Notably, neurons were particularly di- 
verse outside the cortex. The hypothalamus, 
midbrain, and hindbrain contained markedly 
high neuronal heterogeneity, consistent with 
their diverse functions. These neurons were also 
organized less hierarchically compared with 
cortical neurons: Many belonged to a single 
supercluster that uniquely contained both in- 
hibitory and excitatory neurons along with ~ 
serotonergic and dopaminergic neurons. These 
neurons combinatorially expressed many neu- 
rotransmitters and neuropeptides. 

Glia also varied across brain regions, and 
their diversity similarly reflected development. 
In particular, both astrocytes and oligodendro- . 
cyte precursors formed two major groups en- 
riched within and outside the telencephalon. 
By contrast, mature oligodendrocytes exhib- 
ited two major types found across the entire 
brain. However, these oligodendrocyte types 
existed in different proportions inside and out- 
side the telencephalon, which suggests that 
even the relatively homogeneous oligodendro- 
cyte lineage exhibits regional variation. 


CONCLUSION: Our findings suggest that each 
brain area contains a specific complement 
of cell types and states, which implies that a 
complete characterization of cell types will re- 
quire deep tissue sampling, particularly out- , 
side the cortex. The telencephalon appears 
unique with respect to other brain regions, 
across both neurons and glial cells, whereas 
the brainstem comprises an extremely diverse 
set of neurons that may support innate be- 
haviors. These observations have implications 
for a range of human diseases that exhibit re- 
gional variation, including cancer and neu- 
rodegenerative disease. Our work therefore 
provides a basis for exploring the role of neu- 
roepithelial diversity in human health and 
disease. 
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The human brain directs complex behaviors, ranging from fine motor skills to abstract intelligence, but 
the diversity of cell types that support these skills has not been fully described. In this work, we 

used single-nucleus RNA sequencing to systematically survey cells across the entire adult human brain. 
We sampled more than three million nuclei from approximately 100 dissections across the forebrain, 
midbrain, and hindbrain in three postmortem donors. Our analysis identified 461 clusters and 3313 subclusters 
organized largely according to developmental origins and revealing high diversity in midbrain and 
hindbrain neurons. Astrocytes and oligodendrocyte-lineage cells also exhibited regional diversity 

at multiple scales. The transcriptomic census of the entire human brain presented in this work provides 
a resource for understanding the molecular diversity of the human brain in health and disease. 


he mammalian brain controls sensory- 

motor function, maintains physiological 

homeostasis, and supports cognition and 

memory. Single-cell sequencing has re- 

vealed extensive cell type diversity in 
the entire mouse brain, as well as regions of 
the human brain (/-5). We performed single- 
nucleus RNA sequencing on tissue from across 
three entire human brains from male do- 
nors (Fig. 1A and Materials and methods). We 
aimed to sample 100 anatomically distinct 
locations (dissections) from the three brains 
in technical duplicate (table S1). We also in- 
cluded one motor cortex dissection from a fe- 
male donor published in a previous work 
(MIC; two technical replicates, 55,591 cells) 
(3). We ultimately included 606 high-quality 
samples covering 105 dissections across 10 brain 
regions obtained from four brains (fig. S1 and 
table S2). 

We used graph-based clustering to hierar- 
chically divide cells first into 31 superclusters, 
then 461 clusters, and finally 3313 subclusters 
(Fig. 1, B to D, and Materials and methods). 
We named superclusters based on the litera- 
ture and their regional composition (table S3) 
with one exception. One supercluster con- 
tained neurons from most brain regions, and 
we named these neurons splatter neurons 
based solely on their appearance on the two- 
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dimensional embedding (Fig. 1B). A dendro- 
gram of all clusters mirrored previous mouse 
work: Neurons split from nonneuronal cells 
and into two main clades (Fig. 1E) (7). One 
clade contained telencephalic excitatory neu- 
rons expressing SLCI7A7; the other contained 
telencephalic inhibitory neurons (SLC32A]+) 
as well as all excitatory and inhibitory di- 
encephalic, midbrain, and hindbrain neurons 
(Fig. 1E). 


Supercluster distributions reflect 
brain development 


The distributions of superclusters across dis- 
sections likely reflected their developmental 
history (Fig. 2A). Most superclusters derived 
from the telencephalon, which indicates that 
we sampled this developmental compartment 
deeply relative to its complexity. Several super- 
clusters corresponded to the six layers of cor- 
tical excitatory neurons: intratelencephalic 
(IT)-projecting [wpper-layer and deep-layer 
IT (italics indicating supercluster names) ], 
near-projecting (NP) (deep-layer NP), and 
corticothalamic-projecting (CT/6b) (deep-layer 
CT/6b). Both upper-layer IT and deep-layer IT 
superclusters contained putative layer-four 
clusters. Putative layer-five extratelencephalic 
(ET)-projecting neurons belonged to miscel- 
laneous, along with lymphocytes and other 
rare cell types (~25,000 cells). Two superclus- 
ters corresponded to the cerebral nuclei’s do- 
pamine receptor-expressing medium spiny 
neurons (MSNs) and CASZ1+ eccentric MSNs 
(Fig. 2A; fig. S2, A to F; and Materials and 
methods) (5). Most eccentric MSNs expressed 
only DRDI, but many in the basal ganglia co- 
expressed DRDI and DRD2, suggesting regional 
specialization (fig. S2, G and H). Most other 
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superclusters were region-specific excitatory 
neurons. 

The presence of three superclusters in mul- 
tiple regions likely reflected migration during 
development. First, cortical interneurons were 
distributed across the telencephalon, reflecting 
their migration from the medial and caudal 
ganglionic eminences (MGE and CGE inter- 
neurons, respectively) (6). Our data suggested 
that CGE interneurons migrate more widely, 
including to the hypothalamus, thalamus, and 
midbrain. This observation might reflect tran- 
scriptomic convergence during development— 
LHX6 generates local inhibitory neurons in 
the hypothalamus (7)—but migration from the 
ganglionic eminences into the thalamus oc- 
curs specifically in humans (8). Second, a super- 
cluster found mainly in the thalamus and 
midbrain likely corresponded to SOX/4+ in- 
hibitory neurons that migrate from the em- 
bryonic midbrain to the thalamus (9). Our data 
suggested that some of these cells also migrate 
to the hypothalamus and pons; we confirmed 
the latter with in situ hybridization (fig. S3). 
Third, two superclusters appeared to derive 
from the rhombic lip, which produces not 
only the cerebellum but also neurons that 
migrate to specific nuclei in the pons and 
medulla (70). 


Dissections reveal supercluster-specific 
anatomical diversity within brain regions 


Because superclusters were enriched in specific 
developmental compartments and regions, we 
wondered whether anatomical relationships 
also explained heterogeneity within super- 
clusters. As expected, telencephalic excitatory 
neurons varied considerably more than inter- 
neurons (fig. S4, A and B, and Materials and 
methods) (77). Nevertheless, even some broad- 
ly distributed clusters were distributed un- 
evenly across dissections. To determine whether 
clusters were distributed according to system- 
atic rules, we examined how dissections with- 
in the same regions related to one another. We 
calculated neighborhood graphs separately for 
the cerebral cortex, cerebral nuclei, and thal- 
amus, connecting each dissection to the two 
dissections with the most similar cluster pro- 
portions. Dissections were often linked to an- 
atomical neighbors, which demonstrates that 
cell type composition broadly reflects spatial 
position within developmental compartments 
(Fig. 2B and fig. S4, C and D). Although im- 
precise dissection might partly drive this re- 
sult in the cerebral nuclei and thalamus (table 
S4), the more unambiguous cortical dissections 
were similarly arranged (Fig. 2B). 

However, superclusters might distribute dif- 
ferently across brain regions. We therefore 
compared dissections within each superclus- 
ter separately, focusing on the cortex because 
of its relatively high number of dissections and 
superclusters. Neighboring dissections were still 
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broadly more correlated, but superclusters ex- 
hibited different correlation patterns and were 
not always smoothly distributed across the 
brain (Fig. 2C and fig. S4E). Excitatory neu- 
rons produced diverse patterns, which implies 
different degrees of anatomical specialization. 
Upper-layer IT neurons appeared most spe- 
cialized and were distributed distinctively in 
many dissection groups like Brodmann areas 
44 to 46 (A44-A45, A46), frontal insula and 
anterior cingulate cortex (FI, A24:), motor cor- 
tex (M1C) and somatosensory cortex (SIC), and 
visual cortex (VIC). Deep-layer NP neurons 
were distributed in two major patterns that 
split within the parietal cortex. Some dissec- 
tions consistently grouped together, including 
FI and A24, as well as A44-A45 and A46. The 
entorhinal cortex and paleocortex [medial en- 
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torhinal cortex CMEC), lateral entorhinal cor- 
tex (LEC), piriform cortex (Pir), anterior ol- 
factory nucleus (AON)], considered the more 
evolutionarily conserved part of the cortex 
that contains only three cellular layers, were 
particularly distinct. 

We next explored whether particular clus- 
ters drove these correlation patterns. Whereas 
most interneuron clusters contributed to all 
cortical dissections (Fig. 2D), some excitatory 
clusters were enriched in specific dissections 
that explained previous observations. M1C 
and SIC shared two upper-layer IT clusters 
(no. 128 and no. 131) that likely drove their 
high correlation in this supercluster and their 
tight relationship on the neighborhood graph 
(Fig. 2, B and D, and fig. S4E). VIC exhibited 
distinctive deep-layer CT/6b and upper-layer 


13 October 2023 


. ; 
tr n 


~Max 


ll aa | . 


7Min 
‘0 


IT clusters (no. 107 and no. 133), including 
VIC’s distinctive TRPC3+ layer-four neurons 
(12). The entorhinal and piriform cortex con- 
tained distinctive deep-layer IT and deep-layer 
CT/6b clusters as well as amygdala excitatory 
and interneuron clusters (Fig. 2, B and D). 
However, no clusters were uniquely enriched 
in FI and A24 or A44-A45 and A46, which in- 
dicates that these highly correlated dissections 
instead contained similar proportions of more 
broadly distributed clusters. 

Outside the cortex, superclusters and clus- 
ters also distinguished dissections according 
to different patterns (fig. S5). Some super- 
clusters distributed relatively broadly, includ- 
ing migratory MGE and CGE interneurons and 
midbrain-derived inhibitory neurons. By con- 
trast, many region-specific superclusters were 
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Fig. 2. Superclusters and clusters are differentially distributed across brain 
regions. (A) Supercluster distributions across dissections, normalized by column 
and colored by region (gray, potential index-hopping and dissection imprecision). 
(B) Cortical dissections colored by lobe and connected to the two dissections with 
the most similar cluster proportions. (©) Heatmaps showing correlations of cluster 
proportions between dissections, in the same order on both axes. (Left) Color bar 


indicating cortical lobe colored as in (B). (D) Stacked bar plots per cluster 

(excluding clusters with less than 100 cells or less than 1% cells from cortex) showing 
the relative contributions of neurons from each cortical dissection, colored according 
to the legend. A UMAP embedding is shown for select clusters (gray, noncortical cells). 
Clusters are labeled with two of their most enriched genes. Red labels, GABAergic 
(SLC32AI1); blue labels, excitatory glutamatergic (SLC17A6 or SLCI7A7). 


enriched in specific nuclei. Many amygdala nu- 
clei contained a relatively specific amygdala 
excitatory cluster. The striatum [caudate (CaB), 
nucleus accumbens (NAc), and putamen (Pu)] 
contained relatively specific MSN clusters. 
Thalamic excitatory clusters were highly en- 
riched in the lateral geniculate nucleus (LG) and 
anterior nuclear complex (ANC). Even within 
these superclusters, however, many clusters 
were found across multiple dissections (figs. 
S4A and S5). Because dissections often over- 
lapped in these regions, future work will likely 
localize an even greater number of cell types to 
specific nuclei. Nevertheless, our observations 
suggest that individual brain nuclei are tran- 
scriptionally distinct as a result of both varying 
cell type proportions and specialized cell types. 


Most neuronal diversity is found outside 
the telencephalon 


Splatter neurons comprised a heterogeneous 
group of cells from most brain regions, especial- 
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ly midbrain, hindbrain, and more than half of 
all hypothalamic neurons (Fig. 3A and fig. S6A). 
Only one splatter cluster contained cells from 
across the cortex and likely represented long- 
range projecting y-aminobutyric acid-releasing 
(GABAergic) neurons (Fig. 2D; no. 235 express- 
ing SST and CHODL) (11). The supercluster ex- 
pressed diverse neurotransmitters, containing 
both inhibitory and excitatory cells—22% ex- 
pressed the GABA transporter SLC32A1, 39% 
expressed the glutamatergic transporter SLCI7A6, 
and 1.4% coexpressed these genes (Fig. 3, B 
and C, and fig. S6B)—as well as cholinergic, 
glycinergic, serotonergic, and dopaminergic 
neurons (fig. S6C). Illustrating the complexity 
of these cells relative to other superclusters, 
92 splatter clusters yielded 1145 subclusters, 
>900 more than any other supercluster (Fig. 3 
and fig. S6D). Moreover, nearly twice as many 
principal components were required to describe 
splatter neurons (Fig. 3E, fig. S6E, and Mate- 
rials and methods). 
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Furthermore, principal components did not 
reveal a clear hierarchical structure to the super- 
cluster and instead primarily distinguished a 
specific telencephalic subtype (fig. S6, F and 
G). Even within dissections, the first compo- 
nents in splatter-enriched regions were less cor- 
related to neurotransmitter identity compared 
with cortical dissections, even when splatter 
neurons were analyzed separately (fig. S6, H 
and I). We explored whether this atypical mo- 
lecular architecture was an artifact of our meth- 
ods and observed that splatter neurons were 
consistent across donors and did not stand out 
in terms of quality metrics (fig. S6, J and K). 
Splatter neurons consistently grouped togeth- 
er after down-sampling, without donor-batch 
correction and with different gene sets (fig. S6, 
Land M). Moreover, a Gaussian mixture mod- 
el (GMM) clustered splatter neurons together 
independently of the cell-neighborhood graph, 
which implies that these cells occupy a distinct 
transcriptional subspace (fig. S7A). 
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Fig. 3. Splatter neurons are highly complex. (A) Splatter neurons colored on the t-SNE by cluster. 

(B and C) Cells colored by their expression of the gene they express most highly among those in the legend 
(gray, no expression). Arrows indicate clusters with high AVP and SLC5A7 expression. (D) For each 
supercluster or subset, median cluster size (left) or total subclusters (right) is plotted against total number 
of clusters. Splatter is orange. (E) The optimal number of principal components to represent each 
supercluster, calculated separately for each donor as indicated by the key. Serotonergic and dopaminergic 
indicate splatter subsets. (F) Cells from the parabrachial nuclei colored and labeled by supercluster on 

the t-SNE. (G) Distributions of Euclidean distances between cells and their 25th nearest neighbors, 
normalized by each dataset’s maximum distance. (H) Splatter cells from the parabrachial nuclei colored 

by subcluster on the t-SNE. (1) Same as (B). (J) Serotonergic neurons colored by cluster on the t-SNE. 
(K) Dissections that constitute 95% of the serotonergic neurons shown in (J). The node size represents 
the number of cells from each dissection, and the edge width represents the percentage of cells that were 
neighbors. (L) Expression across serotonergic cells for select genes. (Top bar) Subcluster colored as in 
(J). (Second bar) Region colored as in (K). (M) TH+ and dopaminergic neurons colored by subcluster. (N) Same 
as (K), but for TH+ neurons. (0) Same as (B). (P) Finer subtypes, indicated by color and marker genes. 


We therefore used an alternative factoriza- 
tion method to identify transcriptional mod- 
ules across these cells. Topic modeling [latent 
Dirichlet allocation (LDA) (73)] uncovered a 
splatter-enriched module that included neu- 
ronal activity-related genes, including seroto- 
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nergic receptor HTR2C, voltage-gated channels 
SCN7A and SCNQYA, and the actin-organizing 
protein KLHL1 (fig. S7, B and C). However, the 
genes were not highly specific, and LDA mod- 
els fit with fewer topics grouped splatter neu- 
rons with other superclusters (fig. S7D). We 
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therefore hypothesized that some other super- 
clusters were similar to splatter neurons, but 
were more abundant, and thus were more 
easily separated by clustering analysis. To test 
this idea, we reanalyzed a down-sampled data- 
set but retained only five mammillary body 
neurons, which LDA had grouped with splatter 
neurons. The five cells clustered with splatter 
neurons (fig. S7E). Accordingly, most nontelen- 
cephalic neurons were second-most similar to 
splatter neurons (fig. S7F). These observations 
suggest that splatter neurons comprise diverse 
nontelencephalic cell types that were not sam- 
pled deeply enough to be modeled well in the 
current dataset. Some might form their own 
superclusters if sampled more deeply; when 
we down-sampled our whole dataset by clus- 
ter, splatter neurons were relatively more com- 
mon and exhibited more structure on the 
embedding (fig. S6L). 

Therefore, this supercluster might be better 
understood by considering subsets. We first 
examined whether these neurons appeared 
less complex in specific dissections and ana- 
lyzed the parabrachial nuclei, where we sam- 
pled a high proportion of splatter neurons. 
We found midbrain-derived inhibitory, upper 
rhombic lip, lower rhombic lip, cerebellar 
inhibitory, and splatter neurons (Fig. 3F). Splatter 
neurons were most heterogeneous, exhibit- 
ing greater distances from their 25th nearest 
neighbors than did neurons from other super- 
clusters, whose distances were more like those 
between cortical neurons (Fig. 3G). We found 
147 splatter subclusters with more than five cells 
in the parabrachial nuclei (Fig. 3H). The sub- 
clusters spanned populations known to neigh- 
bor (PHOX2B+) and spatially segregate within 
(FOXP2+, LMX1B+, and PAX5+) the murine 
parabrachial nuclei (14, 15). NR4A2 and LMXI1A 
more specifically labeled FOXP2+ and LVMX1B+ 
neurons in our data (Fig. 31). Parabrachial cell 
types are therefore extremely diverse but broad- 
ly conserved across mice and humans. 

We also explored splatter clusters defined 
by neurotransmitter type, such as serotonin. 
Serotonergic markers FEV and SLC6A4 were 
highly expressed in cluster 397, which com- 
prised 14 subclusters derived primarily from 
the pons, midbrain, medulla, and hypothala- 
mus (Fig. 3, J and K, and fig. S8, A to I). All 
cells expressed FEV except a small cluster of 
20 PAX6+ cells and some PITX2+ hypothal- 
amus cells (Fig. 3L and fig. S8I). As previously 
described in mice, subclusters could be grouped 
into rostral- (HMX3+) and caudal- (HOX+) 
derived populations, and rostral neurons could 
be further split into ENJ+ and ENI- popula- 
tions (Fig. 3L) (16). Similar to a recent single- 
cell study of mouse serotonergic neurons (J7), 
our analysis revealed combinatorial expression 
of glutamatergic and GABAergic genes in ad- 
dition to neuropeptides like PENK and TRH 
(Fig. 3L and fig. S8, G to I). One subcluster in 
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our analysis appeared to correspond to the dis- 
tinct P2RYI+MET+TACR3+ cells located near 
the cerebral aqueduct in mice (17). This sub- 
cluster derived mostly from the periaqueduc- 
tal gray, which suggests that these neurons 
and their specific electrophysiological and pro- 
jection properties are conserved in humans. 

We next investigated splatter cluster 395 that 
expressed dopaminergic markers TH, SLC6A3, 
and DDC and contained mostly midbrain— 
substantia nigra, red nucleus, and periaque- 
ductal gray—and very few hypothalamic neurons 
(Fig. 3, M and N, and fig. S8, J to N). This analy- 
sis yielded three neuronal classes (TH+SOX6+, 
TH+CALBI+, and TH+GAD2+) and 14 sub- 
types (Fig. 3, O and P; fig. S8O; and Materials 
and methods). We trained a classifier on a re- 
cent dataset that defines 10 subtypes and two 
classes (TH+SOX6+ and TH+CALBI1+) of dopa- 
minergic neurons among NR4A2+ nuclei from 
the substantia nigra (4). We found that seven 
of our 14 neuronal subtypes corresponded to 
eight previously defined dopaminergic neuron 
subtypes (fig. S8, P to S, and Materials and 
methods). Integration analysis of both data- 
sets confirmed these results and the presence 
of seven additional neuronal subtypes (fig. S8, 
T to V, and Materials and methods), including 
one in the substantia nigra (TH+SOX6+LPL+) 
and four TH+CALBI+ subtypes (PAX5+, NPW+, 
GCCR+, or VIP+) in the periaqueductal gray 
and ventral tegmental area surrounding the 
red nucleus (Fig. 3P and fig. S80). Similar to 
serotonergic neurons, 7H+CALBI+ subtypes 
in the periaqueductal gray expressed differ- 
ent combinations of neuropeptides, such as 
VIP, NPPC, NPW, and PENK (fig. S8W). In 
addition, we found two TH+GAD2+ clusters 
(CALCR+ or EBF2+) that defined the molec- 
ular signature of a third class of neurons that 
coexpress some dopaminergic and GABAergic 
markers (78) (fig. S8, O and X). Our results are 
thus consistent with midbrain 7H+ neurons 
exhibiting transcriptional profiles ranging from 
a complete dopaminergic phenotype (12 sub- 
types of the 7H+SOX6+ and TH+CALBI+ classes) 
to TH+ neurons with a partial dopaminergic 
and GABAergic phenotype (two subtypes of 
the TH+GAD2+ class). 

Thus, even splatter subtypes exhibit exten- 
sive regional and molecular heterogeneity. In 
fact, serotonergic and dopaminergic neurons 
exhibited similar complexity to some super- 
clusters, despite comprising only about 1000 cells 
(Fig. 3E). Similarly, enhanced electric fluores- 
cence in situ hybridization (EEL-FISH) (Mate- 
rials and methods) revealed GABAergic and 
glutamatergic neurons in the midbrain and 
pons that expressed a number of neuropep- 
tides (fig. S9, B and C: B1-13, C1; and fig. S10). 
We also observed some very rare cell popula- 
tions: In the substantia nigra pars reticulata, 
we found cells coexpressing GHR and SST (fig. 
S9A: A4), whereas in the retrorubral field, we 
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found cells expressing peptides such as ADCYAPI, 
CARTPT, NPX, or NTS (fig. S9B: B10-12). Cells 
across the brain stem are therefore defined by 
combinatorial neuropeptide and neurotrans- 
mitter expression. 

Our observations suggest that splatter neu- 
rons cluster together because neurons outside 
the telencephalon are (i) less similar to telen- 
cephalic neurons than to one other and (ii) not 
hierarchically organized by neurotransmitter 
identity. This similarity is weak and does not 
imply a biological category. Our previous ado- 
lescent mouse brain atlas did not reveal any 
splatter-like clusters (1). However, we observed 
splatter-like neurons outside the cortex and 
hippocampus when we included cells that had 
initially been excluded from that dataset (fig. 
S11, A to C). The population included adjacent 
glutamatergic, GABAergic, and glycinergic sub- 
populations that expressed genes that our LDA 
model ranked highly for splatter neurons (fig. 
S11D). These neurons were possibly excluded 
previously as a result of their lack of distinct 
markers because most were similar in qual- 
ity to other mouse neurons (fig. SIIE). Accord- 
ingly, regions like the preoptic hypothalamus 
are highly heterogeneous in the mouse (J9). 
These and our findings underline that the mam- 
malian brain contains highly diverse popula- 
tions that are challenging to cluster and organize 
hierarchically without deeper sampling. 


Oligodendroglia and astrocytes are specific to 
brain regions 


Oligodendrocyte-lineage superclusters in- 
cluded oligodendrocyte precursor cells (OPCs), 
committed oligodendrocyte precursors (COPs), 
and oligodendrocytes and projected onto a single 
uniform manifold approximation and projec- 
tion (UMAP) embedding that resembled line- 
age differentiation (Fig. 4A). A dendrogram 
grouped eight oligodendrocyte clusters into 
two types that expressed OPALIN or RBFOX1, 
as previously described (Oligol and Oligo2, re- 
spectively) (Fig. 4, B and C; fig. S12A; and table 
S5) (20). One distinct RBFOX1+ cluster largely 
(88%) derived from a single dissection in one 
donor and was likely low quality (fig. S12, B to 
F). RBFOX1+ oligodendrocytes were previously 
postulated to be a mature end state, consistent 
with their UMAP position in our analysis. 
Four OPC clusters were characterized by re- 
gional specificity rather than maturation state 
(Fig. 4A and figs. S12G and S13). Two major 
types, OPC1 and OPC2 (Fig. 4, D and E), were 
enriched within and outside the telencepha- 
lon, respectively, and likely corresponded to 
the NELL1+ cortical and PAX3+ spinal cord 
populations recently described (27). One OPC2 
cluster was 54% mammillary nucleus, remi- 
niscent of an oligodendrocyte cluster (fig. S13, 
C to E). OPCI and OPC2 differentially ex- 
pressed 128 genes with more than a fourfold 
change in either population, including axon- 
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guidance molecules, region-specific transcription 
factors, and Notch signals (fig. S13A and table 
S6). Although we previously found only one 
OPC cluster in the adolescent mouse, reanalysis 
revealed that mouse OPCs expressed type- 
specific markers Hes5 and Jrv5 mutually exclu- 
sively (fig. S13B). Thus, OPCs exhibit relatively 
little transcriptomic variation—few principal 
components explained most of their variance 
(Fig. 3E)—but are not homogeneous across the 
brain. Accordingly, clustering within OPC1 
appeared driven by two expression gradients. 
Telencephalon-specific transcription factor 
FOXGI was evenly expressed across OPCI, 
whereas the secreted protein NELL1I and cell- 
adhesion molecule CNTN35 increased smoothly 
(fig. S13, F to I). The second gradient was de- 
fined by the proteoglycan GPC5, a marker for 
gray matter astrocytes (Fig. 4E) (22). Telenceph- 
alic OPC heterogeneity might therefore reflect 
interactions with the local environment. 

All regions contained substantial numbers 
of both oligodendrocytes but not both OPC 
types, which argues against a simple model 
where OPC1 generates Oligol and OPC2 gen- 
erates Oligo2. For example, OPC1 was largely 
absent from the midbrain, but both Oligol and 
Oligo2 were present. Nevertheless, oligoden- 
drocytes exhibited regional differences: Oligol 
predominated in the telencephalon; Oligo2 
cells progressively increased toward the brain’s 
more posterior regions (Fig. 4F). Because Oligol 
was also putatively less mature than Oligo2, 
we investigated whether the differential dis- 
tributions of OPCs and oligodendrocytes along 
the antero-posterior axis might partly reflect 
different rates of oligodendrocyte turnover 
inside and outside the telencephalon. Telenceph- 
alic OPCs expressed more cell cycle-related 
genes than nontelencephalic OPCs, which sug- 
gests proliferation (Fig. 4G). 

We therefore investigated the ratios of OPC, 
COP, and oligodendrocyte populations across 
dissections, reasoning that these ratios would 
reflect relative turnover rates. We found an 
elevated ratio of COPs to OPCs in the telenceph- 
alon, which indicates more frequent initia- 


tion of differentiation (Fig. 4H). The ratio of . 


oligodendrocytes to OPCs was instead higher 
in the brainstem (Fig. 41; confirmed by EEL- 
FISH in Fig. 4J), which indicates lower turn- 
over of mature cells relative to the progenitor 
population. Thus, in our dataset, telencephalic 
OPCs appear more proliferative, differentiate 
more frequently, and form a larger population 
relative to oligodendrocytes. Elevated OPC pro- 
liferation might therefore more frequently re- 
plenish the oligodendrocyte population in the 
telencephalon, lowering the proportion of more 
mature Oligo2 cells. However, Oligo2 is not 
experimentally confirmed as a mature end 
state, and OPCs play a variety of roles in the 
brain. These ratios might reflect other regional 
differences. 
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Fig. 4. Oligodendrocyte lineage. (A) Oligodendrocytes (RBFOX1+ below dotted orange line), COPs, 

and OPCs on a UMAP embedding, grayscale-colored by cluster. Pie charts show regional compositions. 

(B) Type 1 and type 2 oligodendrocytes separately re-embedded by t-SNE (left). Differential gene expression 
(right) (orange line, g < 10°°). The y axis shows —logio of the adjusted P values. (C) Oligodendrocytes colored 
by gene expression. (D) Same as (B), but for OPCs. (E) OPC gene expression. (F) Proportions of type 1 
and type 2 oligodendrocytes and OPCs found in each region colored as in (A). (G to I) Dissections colored 
by region as in (A). (Left) Telencephalon. (Right) Rest of the brain. Vertical axis represents a dissection’s 
percentage of cycling OPCs (G), ratio of COPs to OPCs (H), or ratio of oligodendrocytes to OPCs (I). Asterisks 
indicate significant differences (Mann-Whitney, P < 10°“). The y axis in (G) was trimmed to 0.01; one 
telencephalic dissection contained 0.04% cycling cells. (J) Ratios of oligodendrocytes to OPCs in each tissue 
section profiled with EEL; SN-RN originated from an anterior position. 


As expected, astrocytes exhibited more sharply 
delineated regional heterogeneity than oligoden- 
drocytes. The mouse has two major astrocyte 
types—telencephalic and nontelencephalic— 
that contain gray matter (Gfap-negative) and 
white matter (Gfap-positive) subtypes (1). We 
found this molecular architecture conserved 
in humans: 13 astrocyte clusters were orga- 
nized into telencephalic and nontelencephalic 
types (type 1 and 2 respectively), each contain- 
ing GFAP-low and -high populations (Fig. 5, A 
and B, and fig. S14). Type 1 included known 
human cortical populations: WIFI+ gray mat- 
ter, ZNC+ white matter, and LMO2+ interlam- 
inar astrocytes (2). Interlaminar astrocytes 
appeared especially distinct. 

However, our data revealed additional re- 
gional heterogeneity, with astrocyte subtypes 
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organized according to developmental and an- 
atomical compartments (Fig. 5, D and E, and 
fig. S14, D to F). Mirroring cortical neurons, 
gray matter cortical astrocytes included two 
clusters enriched in the neocortex and in the 
entorhinal and piriform cortex, respectively 
(no. 52 and no. 54). Whereas the former was 
mostly specific to the cortex, the latter was also 
found in the hippocampus and amygdala. 
Striatal astrocytes formed their own distinct 
cluster (no. 55). Globus pallidus astrocytes re- 
sembled midbrain and thalamic astrocytes and 
therefore represented the only major telen- 
cephalic type 2 cells (fig. SI4F); other type 2 cells 
derived from the diencephalon, midbrain, and 
hindbrain. LDA revealed additional heteroge- 
neity orthogonal to clusters, including a topic 
enriched in the globus pallidus and well re- 
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presented by FOS and JUN expression (Fig. 5, 
F to H, and fig. S15). These genes are downstream 
of many cellular processes including ischemia, 
which suggests that pathological conditions 
drove the classification of many globus pallidus 
astrocytes. Two other topics (11 and 8) were 
well represented by potential markers for reac- 
tive astrocytes (23) (Fig. 5H and fig. S15D). 
Together, these results emphasize both the re- 
gional heterogeneity of astrocytes and the di- 
verse molecular programs with which they react 
to their environments. 


Discussion 


In this work, we provide an overview of tran- 
scriptomic diversity across the brain. Future 
work must disentangle whether these distinct 
transcriptomic states exhibit different epige- 
nomic and functional properties. Our study 
highlights some of the challenges associated 
with profiling human tissue. Dissections were 
difficult to distinguish and replicate across 
donors, preventing us from definitively map- 
ping cell types to their precise anatomical 
locations (see the “Technical notes on the data- 
set” section in the Materials and methods; table 
S4). We also likely observed pathological states, 
such as ischemia (in astrocytes, for example) 
(Fig. 5), but cannot confirm this observation 
without more extensive donor sampling. Our 
dataset does not permit conclusions about 
donor-specific diversity; how cell types vary by 
age, sex, and behavioral and disease states will 
be exciting avenues for future research. 

Nevertheless, our study design enabled us to 
make important observations about regional 
variation across the brain. We identified a no- 
table gap in our current understanding of 
neuronal diversity outside the telencephalon, 
reflecting these regions’ anatomical complex- 
ity. Splatter neurons expressed neuropeptides, 
neurotransmitters, and other neuronal genes 
in unexpectedly complex patterns. For instance, 
TH+, dopaminergic, and serotonergic neurons 
expressed diverse neuropeptides and a graded 
range of GABAergic markers, generating ex- 
traordinary molecular diversity. This result in- 
dicates that future work must profile individual 
brain nuclei to fully ascertain their cellular 
diversity. 

Our results underscore how strongly region- 
al and developmental origins shape adult tran- 
scriptomic types. Both astrocytes and OPCs 
exhibited telencephalic and nontelencephalic 
types that transitioned near the hypothalamus 
(figs. S12G and S14D). Astrocytes showed ad- 
ditional region-specific types, suggesting spe- 
cialized functions in support of local neurons. 
Some superclusters reflected migratory cell 
types that appeared in multiple dissections. A 
thorough understanding of region-specific cel- 
lular diversity and its development will there- 
fore be key for treating disease. For example, the 
distinctive composition of the telencephalon’s 
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Fig. 5. Astrocyte diversity. (A to D) t-SNE showing two major astrocyte types (A); GFAP expression (B); 
expression of the gene expressed most highly among WIFI, LMO2, and TNC (C); and regions of origin (D). 
(E) Brain region similarity by astrocyte diversity. Node size represents number of cells, and edge width 
represents percentage of cells in the regions or dissections that were neighbors. BNST, bed nucleus of the 
Stria terminalis; SEP, septal nuclei; Sl, substantia innominata and nearby nuclei. (F to H) (Left) Cells 
colored as in (B) by their scores for LDA topics. (Center) Score distributions for each region colored as in (D). 
The y axis is trimmed at 5. (Right) Expression of topic-representative genes. 


oligodendrocyte and the midbrain’s dopamin- 
ergic lineages might be relevant for diseases 
such as multiple sclerosis and Parkinson’s. Our 
work therefore provides a critical foundation 
for exploring the brain’s diverse neural circuitry 
and its implications for human health. 


Materials and methods 
Human postmortem tissue specimens 


Deidentified postmortem adult human brain 
tissue was obtained after receiving permission 
from the deceased’s next-of-kin. Tissue collec- 
tion was performed in accordance with the 
provisions of the United States Uniform Ana- 
tomical Gift Act of 2006 described in the 
California Health and Safety Code section 
7150 (effective 1/1/2008) and other applicable 
state and federal laws and regulations. The 
Western Institutional Review Board reviewed 
tissue collection procedures and determined 
that they did not constitute human subjects re- 
search requiring institutional review board (IRB) 
review. The collection and processing of human 
tissue was approved under Swedish law by the 
Swedish Ethical Review Authority (2019-03054). 

Male and female donors 18 to 68 years of 
age with no known history of neuropsychiatric 
or neurological conditions were considered for 
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inclusion in the study. Routine serological 
screening for infectious disease (HIV, hepatitis 
B, and hepatitis C) was conducted using donor 
blood samples and donors testing positive for 
infectious disease were excluded from the study. 
Specimens were screened for RNA quality and 
samples with average RNA integrity (RIN) 
values =7.0 were considered for inclusion in 
the study. We sought to minimize the post- 
mortem interval, with further exclusionary 
criteria based on evidence of head trauma, in- 
tubation, neuropathology, and homicide. Brain 
specimens meeting these criteria, and for which 
whole brain hemispheres could be obtained 
for the current study, were quite rare and with 
a heavy male bias. Between 2018 and 2022, a 
total of 16 donors met these criteria, with only 
three female donors that ultimately failed qual- 
ity control (QC) criteria. The three donors pass- 
ing all exclusionary and QC criteria were male. 

Postmortem brain specimens were processed 
as previously described (24). Briefly, coronal 
brain slabs were cut at 1-cm intervals, frozen 
in dry ice-cooled isopentane, and transferred 
to vacuum-sealed bags for storage at —80°C 
until the time of further use. To isolate regions 
of interest, tissue slabs were briefly transferred 
to —20°C, and the region of interest was removed 
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and subdivided into smaller blocks on a custom 
temperature controlled cold table. Tissue blocks 
were stored at —80°C in vacuum-sealed bags 
until later use. 


Tissue sampling and single-nucleus 
RNA sequencing 


We sampled all but 12 dissections from three 
postmortem donors, although dissections were 
sometimes combined differently across donors 
to achieve this goal, e.g., PAG became PAG-DR 
in one donor. For most dissections, we enriched 
for neurons with fluorescence-activated nuclei 
sorting. Nucleus isolation was conducted as 
described (25). Gating on 4’,6-diamidino-2- 
phenylindole (DAPI) and NeuN fluorescence 
intensity was also described previously (2). 
NeuN+ and NeuN- nuclei were sorted into 
separate tubes and pooled at a defined ratio 
after sorting. Sorted samples were centrifuged, 
frozen in a solution of 1X phosphate-buffered 
saline (PBS), 1% bovine serum albumin (BSA), 
10% dimethyl sulfoxide (DMSO), and 0.5% 
RNAsin Plus RNase inhibitor (Promega, N2611), 
and stored at —80°C until the time of ship- 
ment on dry ice from the Allen Institute to the 
Karolinska Institute for 10X chip loading. One 
shipment was accidentally switched for air- 
plane parts by the courier and sent to a com- 
pany in Ireland (we received the airplane parts); 
once the error was discovered and rectified, 
the samples arrived thawed but nevertheless 
yielded data that passed QC and are included 
in the dataset (table S2, notes in column M). 

Immediately before loading on the 10x Chro- 
mium instrument, frozen nuclei were thawed 
in a 37°C water bath, spun down briefly, and 
pipetted several times to mix (26). Nuclei were 
then quantified and loaded according to the 
10X Genomics protocol, targeting 5000 cells. 
We aimed to sequence two replicates per sam- 
ple. Nearly all samples were processed with 
the 10x Genomics V3 kit; 12 samples were se- 
quenced with v3.1 (table S2). Samples were 
first sequenced to a shallow depth (~1000 reads 
per cell) on the Illumina NextSeq platform to 
validate sample concentrations. Samples were 
then sequenced to ~100,000 reads per cell on 
the Illumina NovaSeq platform. Sequencing 
saturation was examined for each sample 
using the preseq package (https://github.com/ 
smithlabcode/preseq). Any samples that were 
not saturated to 60% were sequenced more 
deeply using preseg predictions. 


Data preprocessing 


Sequencing runs were demultiplexed with 
cellranger mkfastq version 4.0.0 (10x Genomics) 
and filtered through the index-hopping- 
filter tool version 1.1.0 (10x Genomics). Unique 
molecular identifier (UMI) counts were de- 
termined using STARSolo version 2.7.10a with 
the following parameters: -soloFeatures Gene 
Velocyto; -soloBarcodeReadLength 0; -solo 
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Type CB_UMI_ Simple; -soloCellFilter Empty 
Drops_CR %s 0.99 10 45000 90000 500 0.01 
20000 0.01 10000; -soloCBmatchWLtype 
IMM_multi_Nbase_pseudocounts; -soloUMI 
filtering MultiGeneUMI_CR; -soloUMIdedup 
IMM_CR; -clipAdapterType CellRanger4; 
-outFilterScoreMin 30. 

Barcode whitelists were downloaded from 
the 10x Genomics website. Exonic, intronic, and 
ambiguous counts were summed for cluster- 
ing analysis. 

The reference genome and transcript anno- 
tations were based on the human GRCh38. 
p13 gencode V35 primary sequence assembly. 
However, we filtered the reference. Because 
our pipeline only counted reads that uniquely 
aligned to one gene, reads that aligned to 
more than one gene were lost. Related genes 
therefore exhibited few or zero counts. To 
minimize this loss, we discarded overlapping 
genes or transcripts that overlapped or map- 
ped to other genes or noncoding RNAs’ 3’ 
untranslated regions (3'UTRs), leaving only one 
of these transcripts in the genomic reference. 
We used BLAST to align the last 400 nucleo- 
tides (3’UTR) of all protein-coding and non- 
coding transcripts to all other genes (maximum 
4 mismatches, minimum alignment length 
300 nucleotides). We resolved all the matches 
by the following procedure: 

First, fusion genes were filtered based on 
their names: genes with names that contained 
both fusion genes [“genel-gene2”] were dis- 
carded. Second, noncoding transcripts that 
matched a coding transcript were discarded. 
Third, for overlapping transcripts where both 
transcripts were either coding or noncoding: 

(i) If one of the gene names matched the 
pattern “XX######.#” its transcript was discarded. 

(ii) If one of the transcripts belonged to a 
gene with one or more transcripts that were 
already discarded during the procedure, it was 
discarded as well. 

(iii) We discarded transcripts of the gene 
with fewest splice variants. 

(iv) Otherwise, the transcript with a 3'UTR 
that overlapped the other transcript was kept 
in the genomic reference. 

(v) For paralogs, we mapped all related genes 
(that aligned to one another) and selected the 
paralog with most splice variants. All other 
highly similar paralogs were discarded. In spe- 
cial cases, we manually chose the gene. 

Altogether we filtered 387 fusion genes, 1140 
overlapping transcripts, 414 noncoding tran- 
scripts, 1127 coding paralogs, and 350 non- 
coding paralogs. The reference .gtf file and a 
list of the filtered genes and transcripts can 
be found at https://github.com/linnarsson-lab/ 
adult-human-brain. 


Quality control 


10x Chromium samples that captured more 
than 15,000 cells were not analyzed further 
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(table S2). All remaining samples were pooled 
with their replicates and analyzed with “cyto- 
graph qc” (https://github.com/linnarsson-lab/ 
adult-human-brain, commit 5f07e59768844 
e3c84fceb8fd7d1993059el13a7e), which uses a 
modified version of DoubletFinder to calculate 
a doublet score for each cell (27). We captured 
a median 6224 cells from each sample and 
then filtered cells based on their total number 
of mRNA molecules—as counted by unique 
molecular identifiers (UMIs)—and percent- 
ages of unspliced RNA, as well as doublet 
scores. Cells with fewer than 800 UMIs, fewer 
than 40% unspliced molecules, or a doublet 
score below 0.3 were removed from further 
analysis. These thresholds were determined 
by examining distributions across the data- 
set (fig. S1). 

Total genes captured per cell varied across 
dissections partly due to biological variation 
in cell size. For example, nonneuronal cells are 
small and yield fewer molecules. Although we 
aimed to collect 90% neurons and 10% non- 
neuronal cells from each sample, we were not 
able to achieve this proportion in many non- 
cortical regions due to an abundance of oligo- 
dendrocytes (fig. SIB). We also noted variability 
across donors that likely reflected tissue qual- 
ity (fig. S1C). 


Clustering 


Cells were clustered using an updated version 
of the Cytograph package (https://github.com/ 
linnarsson-lab/adult-human-brain, commit 
22a83ebcf79a9fdcdff34beb0300aa1695d7039b) 
(1). The command “cytograph process” was run 
with configuration min_umis: 0, doublets_ 
action: None, features: variance, remove_ 
low_quality: False, remove_doublets: False, 
batch_keys: [“Donor”], clusterer: sknetwork, 
steps: nn, embeddings, clustering. Default 
values were used for other parameters. In brief, 
principal components analysis (PCA) was per- 
formed on the most variable genes and used to 
construct a k-nearest neighbors graph, which 
provided input for Louvain clustering (scikit- 
network). Harmony with default parameters 
integrated principal components across donors 
before constructing the graph (28). 

This clustering pipeline was used at four 
levels of analysis: (i) All cells that passed QC 
were pooled into a single dataset and clus- 
tered. (ii) This top-level clustering was split 
by the dendrogram into two groups that corre- 
sponded to neuronal and nonneuronal cells, 
with one exception: lymphocytes clustered with 
neurons. Each of these two groups was sepa- 
rately analyzed. (iii) Within each group, Paris 
clustering was used on the neighborhood 
graph to find “superclusters” for further analy- 
sis (Fig. 1) (29). The Paris dendrogram was ar- 
bitrarily cut so that the superclusters mirrored 
large islands on the t-distributed stochastic 
neighbor embedding (t-SNE). Each superclus- 
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ter was separately analyzed to produce “clusters.” 
(iv) Each of the 461 clusters was separately 
analyzed to produce “subclusters.” 

At level 3, additional cluster-level quality 
control was performed. A cluster was removed 
from further analysis if: (i) its mean percent- 
age of mitochondrial UMIs was greater than 
the 99th percentile mean (0.056); or (ii) if the 
Cytograph pipeline tagged the cluster with 
two or more of the following auto-annotations: 
NEUR, ASTRO, OLIGO, OPC, MGL, ENDO, 
BERG, or VLMC. Auto-annotations and fur- 
ther information about the auto-annotation 
procedure are available at https://github.com/ 
linnarsson-lab/auto-annotation-ah. After re- 
moving these clusters, the remaining cells in 
each supercluster were reclustered. This pro- 
cedure was repeated until no clusters met 
these criteria. 

Finally, clusters and subclusters at levels 3 
and 4, respectively, underwent a merging pro- 
cedure (“cytograph merge”). A cluster that did 
not express at least one statistically enriched 
gene was merged with the nearest cluster on 
the t-SNE. In short, false discovery rate (FDR)- 
corrected P values were calculated by compar- 
ing Cytograph’s enrichment scores with a null 
distribution obtained by permuting cluster 
labels across all cells in the supercluster. 

A t-SNE and dendrogram were calculated 
for the final 461 clusters (Fig. 1). The dendro- 
gram was calculated based on Euclidean dis- 
tances between cluster means for the 2000 
most variable genes. Cluster means were nor- 
malized to median cluster means across the 
dataset and then standardized (scikit-learn 
StandardScaler). 

The script for splitting data into superclus- 
ters and creating Cytograph punchcards is 
available as split.py at https://github.com/ 
linnarsson-lab/adult-human-brain. 


Annotation 


Superclusters were manually annotated based 
on the literature and on their regional com- 
position. The splatter supercluster was named 
for its splatter-like appearance on the t-SNE. 
The miscellaneous supercluster contained lym- 
phocytes and very rare neuronal cell types (in 
total only 25,000 cells); the sparsity of these 
cells on the neighborhood graph likely drove 
this grouping. 

Clusters were not named with a single anno- 
tation but instead tagged with auto-annotations 
(https://github.com/linnarsson-lab/auto- 
annotation-ah) from four categories: neuro- 
transmitters (auto-annotation-ha/Human_ 
adult/Neurotransmission), neuropeptides (auto- 
annotation-ha/Neuropeptides), class (auto- 
annotation-ha/Human_adult/Class), and subtype 
(auto-annotation-ha/Human_adult/Subtype). 
Neuropeptide-related tags were manually 
compiled based primarily on the NeuroPep 
database (30). Subtype auto-annotations 
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included manually selected cell types and states 
previously described in the literature. To identify 
cortical cell types, we also transferred labels 
from a single-cell analysis of the middle tem- 
poral gyrus (MTG) (37). We performed PCA 
(50 components) on our full dataset, trained a 
random forest classifier (scikit-learn, class_ 
weight=‘balanced’, max_depth=50) on the MTG 
labels, and then predicted labels for all cells. 
We labeled each cluster with the mode of its 
constituent cells if two conditions were met: 
more than 0.8 of predicted labels matched the 
mode, and the mean probability of these pre- 
dictions was greater than 0.8. Clusters that did 
not meet these criteria were labeled N/A. 


Technical notes on the dataset 


Our annotation effort revealed two experi- 
mental caveats of our work. First, distinguish- 
ing anatomical borders between dissections 
was challenging. Dissections were therefore 
not always identical across donors and might 
contain adjacent tissue. Two diencephalic dis- 
sections contained telencephalon, for exam- 
ple, and therefore contributed to telencephalic 
superclusters (HTHso and VA; Fig. 2A). We 
accordingly further annotated each dissection 
to indicate such ambiguities (table S4). 

Second, we sequenced multiple dissections 
together at once, and sequencing reads were 
occasionally misassigned across samples due 
to index hopping (32). Because index hopping 
usually yields cells with very few UMIs, we 
flagged additional index-hopped cells by ex- 
amining each supercluster for samples that 
only contributed cells with unusually few UMIs. 
Note that such samples might alternatively 
represent low-quality cells, so this approach 
does not exclusively identify index-hopped 
cells. For each supercluster, we calculated the 
fifth percentile of the distribution of total 
UMIs per cell. Then we flagged samples if 
they contributed more than one cell to the 
supercluster and 90% of these cells fell below 
this UMI threshold. We identified 64 such 
samples, but most contributed only a few cells. 
Only two samples contributed more than 100 
such cells to a supercluster: 10X249-1 (V2) to 
deep-layer IT; 10X385-3 (GPi) to lower rhombic 
lip. These results suggest that index hopping 
minimally affected the dataset. 

However, these observations suggest that 
the data should not be used to definitively 
map cell types to dissections. Instead, the dis- 
section labels facilitate a minimal estimate of 
how extensively cell types differ across neigh- 
boring anatomical locations. 


Integrating human and mouse striatal data 


We merged our striatal datasets (CaB, Pu, 
NAC) with striatal data from Saunders et al. 
(5) (striatum at http://dropviz.org/?_state_id_= 
39c454601e4bd7c2), retaining genes with homo- 
logs in http://www.informaticsjax.org/downloads/ 
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reports/HMD_HumanPhenotype.rpt. To inte- 
grate the data, we selected genes that were 
among the top 2000 highly variable genes in 
either dataset. We then used the Cytograph 
pipeline as described under “Clustering” but 
with batch_keys: [“Species”], which prompts 
Harmony to operate along the cell attribute 
“Species” that we added to the joint dataset. 

The script is available as prepare_integration.py 
at https://github.com/linnarsson-lab/adult- 
human-brain. 


RNAscope multiplex fluorescence in situ 
hybridization (mFISH) 


Fresh-frozen human postmortem brain tissues 
were sectioned at 10 um onto Superfrost Plus 
glass slides (Fisher Scientific). Sections were 
dried for 10 min at —20°C and then vacuum 
sealed in plastic slide boxes and stored at —80°C 
until use. The RNAscope multiplex fluorescent 
v2 kit was used per the manufacturer’s in- 
structions for fresh-frozen tissue sections (ACD 
Bio), except that fixation was performed for 
60 min in 4% paraformaldehyde in 1X PBS at 
4°C and protease treatment was shortened 
to 15 min. Sections adjacent to those used for 
RNAscope were fixed for 15 min in 4% para- 
formaldehyde in 1X PBS, briefly washed in 
PBS, and stained with NeuroTrace 500/525 
Green Fluorescent Nissl Stain and DAPI to 
aid in region localization. Nissl-stained sections 
were imaged using a 10X objective on a Nikon 
TiE fluorescence microscope equipped with 
NIS-Elements Advanced Research imaging 
software (v4.20, RRID:SCR_014329). RNAscope 
sections were imaged using a 40X oil immer- 
sion lens on the same microscope. For mFISH 
experiments, positive cells were called by man- 
ually counting RNA spots for each gene. Cells 
were called positive for a gene if they con- 
tained = 5 RNA spots for that gene. Lipofuscin 
autofluorescence was distinguished from RNA 
spot signal based on the larger size of lipo- 
fuscin granules and broad fluorescence spec- 
trum of lipofuscin. Staining for the probe 
combination was repeated with similar results 
on at least three separate sections from one 
human donor. Images were assessed with FIJI 
distribution of ImageJ v1.52p and Nikon NIS- 
Elements imaging software. Probes used were 
Hs-OTX2-C3 (no. 484581-C3), Hs-SOX14-C2 
(no. 1055351-C2), and Hs-CASR (no. 411931) 
from ACD Bio. 


Assessing anatomical heterogeneity 
within superclusters 


For each dissection that contributed more than 
100 cells to a neuronal supercluster, we calcu- 
lated the fraction of cells that derived from 
each cluster in the supercluster. We calculated 
this “cluster-proportion matrix” separately for 
each supercluster, excluding mzscellaneous 
immune-cell clusters. Figure S4B shows these 
matrices as heatmaps. We used the matrices to 
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plot fig. S4A, counting for each cluster the 
number of dissections that comprised at least 
5% cells from that cluster. Each cluster is thus 
its own dot on the plot. 

These cluster-proportion matrices were also 
used to calculate neighborhood graphs (Fig. 
2B and fig. S4, C and D). Each dissection was 
connected to the two dissections with the most 
similar cluster proportions. Nonneuronal cells 
were excluded, because similar proportions of 
neurons and nonneuronal cells were not sorted 
for all samples. Similarity was calculated using 
the sklearn NearestNeighbors function, metric= 
“correlation.” 

Correlations between dissections were visu- 
alized (fig. S4E) by generating correlation 
matrixes from the cluster-proportion matrices 
(the numpy corrcoef function). To generate 
the final heatmap in the figure panel, which 
indicates dissections that were always more 
correlated to one another than to any other 
dissections, mean and standard deviation were 
calculated for each row of the correlation matrix. 
Then dissections in each row were flagged if 
their correlation values were more than 1.5 
standard deviations away from the mean in all 
seven cortical superclusters. We moreover en- 
forced symmetry: dissections must have been 
flagged for one another to be indicated on the 
heatmap. 


Assessing supercluster complexity with PCA 


We used a permutation test to determine the 
number of principal components necessary to 
describe each supercluster, reasoning that this 
value would be proportional to the superclus- 
ter’s complexity. We performed PCA twice on 
cells sampled from each supercluster (scikit- 
learn PCA, n_comps=50): once on the original 
expression matrix, and once on a permuted ex- 
pression matrix, where the expression values of 
each gene were randomly redistributed across 
cells. We compared the percentage of variance 
that each principal component explained (at- 
tribute “explained_variance_ratio_”) in the 
original and unpermuted matrix. We consid- 
ered the first component where explained_ 
variance_ratio_ was similar between the two 
analyses (less than 10% higher in the unper- 
muted analysis than in the permuted) to re- 
present the optimal number of components to 
describe the supercluster (Fig. 3E). 

This procedure was performed separately 
for each donor. 10,000 cells were randomly 
sampled from each donor for the analysis. 
Some superclusters or subsets did not contain 
10,000 cells from a donor. From H18.30.002: 
Deep-layer NP, Hippocampus CA4, Mammil- 
lary body, Cerebellar inhibitory, COP, Epen- 
dymal, Vascular, Bergmann, Fibroblast, Choroid 
plexus, Serotonergic, Dopaminergic. From 
H19.30.001: Deep-layer NP, Miscellaneous, 
Hippocampus CA4, Mammillary body, Cere- 
bellar inhibitory, COP, Ependymal, Vascular, 
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Bergmann, Fibroblast, Choroid plexus, Seroto- 
nergic, Dopaminergic. From H19.30.002: Deep- 
layer NP, Miscellaneous, Hippocampus CA4, 
Mamumnillary body, Cerebellar inhibitory, Lower 
rhombic lip, COP, Ependymal, Vascular, Berg- 
mann, Fibroblast, Choroid plexus, Serotonergic, 
Dopaminergic. 

The script is available as permuted_pca.py 
at https://github.com/linnarsson-lab/adult- 
human-brain. 


Correlating principal components with 
GABAergic identity 


Neurons from all dissections were analyzed 
separately using the Cytograph parameters 
described under “Clustering.” For each dis- 
section, we calculated the absolute value of 
the correlation between the first principal 
component and the log of GAD2 expression. 
For dissections with more than 5000 splatter 
neurons, we additionally analyzed the splatter 
neurons separately and calculated the correla- 
tion value with the same approach. 


Assessing robustness of the splatter supercluster 


We down-sampled our whole-brain dataset by 
randomly selecting 10, 100, or 1000 cells per 
cluster. We then analyzed each down-sampled 
dataset using the Cytograph parameters de- 
scribed under “Clustering.” 

We performed additional analyses with the 
dataset generated by sampling 100 cells per 
cluster. We analyzed it using the Cytograph 
parameters described under “Clustering” but: 
(i) without Harmony (batch_keys: []); (ii) with 
the 2000 most variable genes in the splatter 
supercluster; (iii) with 2000 randomly selected 
genes. 

The script to prepare these variations of the 
dataset for Cytograph analysis is available as 
downsample_data.py at https://github.com/ 
linnarsson-lab/adult-human-brain. 

Lastly, we performed additional analyses 
with the dataset generated by sampling 10 cells 
per cluster. We (i) reclustered the data by fit- 
ting the Cytograph-generated principal compo- 
nents with a Gaussian Mixture Model [sklearn 
GaussianMixture(n_components=10, n_init=50)]; 
(ii) performed LDA as detailed in the section 
below, changing k until a splatter-enriched 
topic appeared at k = 11; (iii) removed all but 
five mammillary body neurons, selected the 
top 2000 most variable genes previously deter- 
mined by Cytograph, normalized the data to 
the median of total UMIs per cell, calculated 
50 principal components with sklearn, and 
recalculated the t-SNE using Cytograph’s 
art_ of_tsne function; (iv) used the sklearn 
KNeighborsClassifier to predict the superclus- 
ter for each cell, restricting the function “fit” to 
use five of the cell’s neighbors outside its own 
supercluster to make the prediction. We then 
calculated for each supercluster the fraction of 
predictions that fell in each supercluster. 
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Topic modeling with LDA 
We used tomotopy version 0.12.2 to perform 
topic modeling with LDA. In brief, we trained 
a model with a specified number of topics “k” 
until perplexity stabilized (function “LDAModel” 
with parameters alpha=50/k, eta=0.1). We 
trained with no more than 200,000 cells and 
all protein-coding genes in the genome that 
were expressed in at least 100 cells and fewer 
than 60% of all cells. Some topics were donor- 
specific or quality-specific, e.g, mitochondrial 
or ribosomal, and some did not have an ap- 
parent biological meeting. We therefore iden- 
tified topics of interest manually. We also 
identified representative genes for each topic 
by using the topic probabilities reported for 
each gene (function “get_topic_word_dist”). 
We filtered genes by specificity (topic prob- 
abilities for each gene normalized by that 
gene’s probability across all topics) and then 
sorted the remaining genes by their unnor- 
malized probabilities. 

A script is available as optimize_lda.py at 
https://github.com/linnarsson-lab/adult- 
human-brain. 


Extended analysis of dopaminergic and 
TH+ neurons 


To identify additional dopaminergic subtypes, 
subclusters 1870, 1871, 1873, and 1876 were 
reanalyzed separately as described under 
“Clustering.” 

To compare our data with a recently pub- 
lished census of midbrain dopaminergic neu- 
rons from Kamath et al. (4), we trained a 
random forest classifier on nuclei from their 
dataset annotated as dopaminergic neurons 
from control samples with nonneurological 
disorders (donor IDs: 3298, 3322, 3345, 3346, 
3482, 4956, 5610, 6173). We trained the clas- 
sifier with features that Cytograph selects when 
run with the option to select cluster-enriched 
rather than variable genes (default parameters 
except 2_factors: 20 and features: enrichment). 
The scikit-learn RandomForestClassifier was 
regularized with the parameters max_features 
and min_samples_leaf. Five-fold cross-validation 
was performed on data from Kamath et al. 
with scikit-learn StratifiedShuffledSplit for 
max_features 0.3, 0.4, 0.5 and min_samples_ 
leaf between 3 and 50. We selected max_ 
features=0.3 and min_samples_leaf=4 to trans- 
fer labels to our data; oob_score was set true, 
and other parameters were left as default. 

We also used Harmony to integrate our TH+ 
neurons and 1000 randomly sampled neurons 
from Kamath et al. We selected features for 
integration by using Cytograph’s Feature 
SelectionByVariance with default parame- 
ters on each dataset separately, and then tak- 
ing the intersection of the two gene sets. 
Clusters were obtained using Cytograph’s 
UnpolishedLouvain with default parameters 
on the harmonized principal components. We 
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then transferred labels from Kamath et al. to 
our 7H+ neurons by using NNDescent on the 
harmonized principal components. A nearest- 
neighbor graph was constructed using NNDescent 
with default parameters. Cell types defined by 
Kamath et al. were considered for transfer 
if they contributed more than 30% to one 
integrated cluster. Cells were assigned the 
most common label among their five nearest 
neighbors. 


EEL-FISH 


EEL-FISH was performed as recently described 
(72). Donor H18.30.002 was used for the VIC 
sample; H20.30.002 for the AMY, MTG, and 
A46 samples; H19.30.001 for the pons sample; 
and H18.30.001 for the SN-RN and RRF sam- 
ples. For all figure panels, genes were only 
plotted for an individual cell if the cell con- 
tained more than two molecules of that gene. 
Expression below this threshold was consid- 
ered experimental noise. 

We manually selected all 445 genes: 168 
previously described markers for forebrain 
neurons and nonneuronal cells and 277 genes 
selected from our data. The latter were ex- 
pressed in subtypes of splatter cells (200 genes), 
OPCs (40 genes), and astrocytes (37 genes). 
Genes were selected only if their mean ex- 
pression was more than 5 UMIs in the target 
cluster. 

Nuclei were segmented using CellPose based 
on the DAPI signal, and the segmentation 
masks were expanded by 8 um without over- 
lapping (33). The RNA dots were only assigned 
to a segmentation mask if they were found 
inside. 


Reanalysis of adolescent mouse brain data 


Adolescent mouse data were downloaded from 
mousebrain.org, including all L1 files from the 
amygdala, cerebellum, cortex, hippocampus, 
hypothalamus, medulla, midbrain, pons, striatum, 
and thalamus. Neuronal cells were reanalyzed 
with Cytograph as described under “Clustering,” 
but with batch_keys: []. Zeisel et al. (1) removed 
a large number of cells, for several reasons. 
Most cells (>200,000 out of 492,949 cells) 
were oligodendrocytes, removed because they 
otherwise would skew the gene selection es- 
pecially in hindbrain. Another ~100,000 cells 
were removed as suspected doublets, and as 
clusters without unique markers. The proce- 
dure was complex, proceeded in several stages, 
and is described in great detail in the orig- 
inal publication (7). We now realize that many 
cells removed in that analysis probably cor- 
respond to splatter neurons in the present 
work. Chromium v1 was 10 times less sensi- 
tive than current v3 chemistry (i.e., ~2000 UMI 
counts per cell for v1 versus ~20,000 for v3, 
even though we used whole cells then and 
only nuclei now). Clustering was therefore ex- 
tremely difficult, and clusters without distinct 
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markers were removed in an effort to be 
conservative. 


Differential expression testing 


5000 cells were randomly sampled from each 
of two types in each of three donors. Counts 
were summed within each group to generate 
six pseudobulk samples. DESeq2 was used to test 
for differential expression across the samples. 

Code is available in a notebook at https:// 
github.com/linnarsson-lab/adult-human-brain. 


Gene ontology enrichment analysis 


The gget package (https://github.com/pachterlab/ 
geet) was used to perform all gene ontology 
enrichment analysis (function “enrichr” with 
database=“ontology”) (34). 
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INTRODUCTION: The human neocortex is gen- | spatial transcriptomic methods enable high- 
erally organized into six layers of neurons but | resolution characterization of the cellular struc- 
the size and cellular composition of these lay- | ture of the human neocortex providing a means 
ers varies across the cortex, and this variation | to quantitatively compare the molecular and 
is thought to underlie differences in connec- | cellular structure and specialization of distinct 
tivity that impart specific functional special- | cortical areas. 

ization to distinct cortical areas. However, the 

degree to which cortical areas have a canonical | RATIONALE: Eight cortical areas that are rep- 
versus noncanonical organization has proved | resentative of major variation in cellular ar- 
difficult to reliably quantify. Single-nucleus and | chitecture and include primary sensory and 


A Cross-areal consensus taxonomy 


posta“ 1 st Caudal sania =" @ ACC 
a Ang CalhStaAca sie inl tite ill Sin 


i 
RNA i ” “ 
snRNA-seq 
i f Ill ae U iy Hi i 


mtg A! 

B Excitatory: Inhibitory ratio Neuronal proportions Neuronal expression 
210.0 Lee L4IT 12/317 
Se 30 ° R 
we @ ee® a 
© 1.0}-@-# 10}-e5¢@ 
5 af eo, 3\e ae 
o 0. e : V1-specific 
7 R C L4 IT LAIT 


Regional specializations of human cortical cell types. (A) Single nucleus RNA-sequencing data from 
eight areas of the human neocortex were used to generate a cross-areal taxonomy with shared and 
area-specific types. (B) The relative number of excitatory and inhibitory neurons is similar across all areas 
except V1. Neuronal cell-type proportions and gene expression varied systematically from the rostral (R) 

to caudal (C) cortex with additional regional signatures. (C) Some excitatory neuron types that are exclusive 
to V1 are in the visual input layer 4, which is expanded compared to other cortical areas. 
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association cortices were sampled using sit ale 
nucleus transcriptomics to generate a dat 
comprised of more than 1.1 million nuclei. 


RESULTS: Nuclei were grouped based on gene 
expression similarity into one of 24 cellular sub- 
classes, which were found in all cortical areas. 
Layer 4 intratelencephalic excitatory neurons 
were present even in agranular areas that lacked 

a histologically distinct layer 4, suggesting a 
common subclass-level cellular blueprint across 
the cortex. However, gene expression and sub- 
class proportions varied substantially between 
cortical areas, with more differences in excit- 
atory projection neurons than inhibitory neu- 
rons. All non-neuronal subclasses were shared 
across cortical areas but their laminar distri- 
butions varied between areas, and astrocytes 
also expressed regional marker genes. Varia- 
tion as a function of rostrocaudal location in 
the cortex was a clear organizational feature ~ 
where neighboring cortical areas were most 
similar, in line with previous observations of 
gene expression similarity by topographic prox- 
imity in the cortex. At a finer cell-type level of 
analysis, area-enriched and area-specific cell 
types were apparent in multiple cortical areas, , 
but most notably in the primary visual cortex 
(V1) that had many distinct excitatory neuron 
types and several distinct inhibitory neuron 
types that reflect the specialized cellular archi- 
tecture of this area in humans and other pri- 
mates. V1 specialized inhibitory cell types were 
mostly Somatostatin-expressing neurons likely 
originating from the medial ganglionic emi- 
nence during development. Layer 4 in V1, which 
is visibly enlarged and has multiple sublayers, 
was notably different from other areas with 
discrete sublaminar distributions of specialized 
excitatory and inhibitory neurons revealed by 
spatial transcriptomics. 


CONCLUSION: A common set of cell types are 
found across human cortical areas that have 
diverse functions. Excitatory projection neu- 
rons exhibit large spatial gradients and regional 
differences in proportions, laminar distributions, 
and gene expression that are less pronounced 
in inhibitory neurons or non-neuronal cells. V1 
is molecularly distinct from other cortical areas 
and several excitatory and inhibitory neuronal 
types are found only in VI. The ratio of excit- 
atory to inhibitory neurons in V1 is also more 
than double that of other cortical areas, reflect- 
ing specialization of the human cortex for pro- 
cessing visual information. 
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Transcriptomic cytoarchitecture reveals principles 
of human neocortex organization 
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Variation in cytoarchitecture is the basis for the histological definition of cortical areas. We used single 
cell transcriptomics and performed cellular characterization of the human cortex to better understand 
cortical areal specialization. Single-nucleus RNA-sequencing of 8 areas spanning cortical structural 
variation showed a highly consistent cellular makeup for 24 cell subclasses. However, proportions of 
excitatory neuron subclasses varied substantially, likely reflecting differences in connectivity across 
primary sensorimotor and association cortices. Laminar organization of astrocytes and oligodendrocytes 
also differed across areas. Primary visual cortex showed characteristic organization with major 
changes in the excitatory to inhibitory neuron ratio, expansion of layer 4 excitatory neurons, and 
specialized inhibitory neurons. These results lay the groundwork for a refined cellular and molecular 
characterization of human cortical cytoarchitecture and areal specialization. 


real parcellation of the neocortex is prem- 

ised on the idea that structural varia- 

tions in cellular architecture (J-3) and 

myeloarchitecture (4) underlie function- 

al divisions [reviewed in (5)]. Neocortex 
has a 6-layered organization common across 
species and areas, apart from agranular areas 
such as the primary motor cortex (M1) that 
lack layer 4. Cortical layers contain projection 
neurons with generally stereotyped input and 
output properties hypothesized to represent a 
“canonical” circuitry (6, 7). However, cortical 
areas differ in positional topography, shape 
and size, laminar and columnar organization, 
and neuron proportions (8-10). 

Advances in single cell transcriptomics have 
revealed a complex, hierarchical cortical cell- 
type architecture based on gene expression 
signatures that is conserved across species 
except at the finest cell-type distinctions (7-15). 
Prior work in M1 established a cellular hierar- 
chy consisting of 24 neuronal and non-neuronal 
subclasses with distinct laminar patterning 
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and correlated phenotypic properties (table 
S1) and revealed deeper cellular complexity in 
any given cortical area than previously appre- 
ciated (72, 13, 15-19). The current study aims to 
quantitatively define the cellular architecture of 
eight human neocortical areas representative of 
topographic, functional, and structural variation, 
using single nucleus RNA-seq (snRNA-seq) and 
spatial transcriptomics methods. 


Within-area cell taxonomies demonstrate 
common subclass architecture 


To sample major axes of cortical variation, we 
analyzed eight neocortical areas that included 
M1 and primary somatosensory (S1), auditory 
(Al), visual (V1) and association areas [dorso- 
lateral prefrontal cortex (DFC), anterior cin- 
gulate cortex (ACC), middle temporal gyrus 
(MTG) angular gyrus (AnG)], which spanned 
the rostral to caudal (anterior to posterior in 
many mammals) extent of the cortical sheet, 
and represented major variations in cortical 
cytoarchitecture (Fig. 1A) (20). Cortical areas 
were identified across tissue donors using a 
combination of surface anatomical landmarks 
and histological verification of cytoarchitecture 
(Methods). Human postmortem brain samples 
were collected from 5 individuals (3 males, 
2 females, table S2). Tissue photographs taken 
at the time of autopsy and tissue dissection 
were used to manually align tissue samples 
to three-dimensional (3D) reference atlases 
[Allen Human Reference Atlas 3D https:// 
github.com/BICCN/cell-locator; Julich-Brain 
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v2.9 parcellation, DOI:10.25493/VSMK-H94 
(21)| and a 2D plate-based reference (Allen 
Human Reference Atlas http://atlas.brain-map. 
org/). The best matching structure in each ref- 
erence atlas is reported (table S2, Methods) 
and secondary structures are reported when 
more than one cortical area is predicted ac- 
cording to the mapping results (table S2). 
Most tissue samples map to a single area in the 
Allen Human Reference Atlas but MTG sam- 
ples included both the intermediate and caudal 
subdivisions of A21. Mapping to the probabi- 
listic Julich Brain Atlas suggests that several 
areas may have been sampled for ACC (area 33 
and area p24ab) and Al (area TE 1.0 and area 
TE 1.1), and variation in the precise location of 
sampling might result in increased variability 
in the cellular compositions of these areas. 

Three snRNA-seq datasets were generated: a 
10x Chromium v3 (Cv3) dataset with >924,000 
nuclei sampled from all cortical layers, a Cv3 
dataset of >231,000 nuclei captured by micro- 
dissection of layer 5 to enrich for rare layer 5 
extratelencephalic projecting (L5 ET) neu- 
rons (for all areas except AnG and M1), anda 
SMART-seqv4 (SSv4:) dataset of over 60,000 nu- 
clei sampled from individual cortical layers to 
provide laminar selectivity for all clusters. For 
AnG, only a Cv3 dataset of all cortical layers 
was generated (Fig. 1B). 

Nuclei were assigned to one of 24 cell sub- 
classes based on transcriptomic similarity to 
a reference taxonomy for human M1 (22, 13), 
and subclasses were grouped into five neigh- 
borhoods (Fig. 1, C and D). For each area and 
neighborhood, nuclei profiled with Cv3 and 
SSv4 were integrated based on shared coex- 
pression and clustered to identify transcrip- 
tomically distinct cell types. Neighborhood 
clusters were aggregated and organized into 
within-area taxonomies ranging between 120 
and 142 cell types (Fig. 1C and figs. S1 to S8) 
with distinct marker expression (table S3). Cel- 
lular variation within subclasses was quanti- 
fied as the average entropy of variably expressed 
genes. Entropy was higher for all neuronal than 
non-neuronal subclasses and did not differ 
between excitatory and inhibitory subclasses 
or across areas based on a two-way analysis 
of variance (ANOVA) followed by post-hoc 
Tukey HSD tests (Fig. 1E). The number of dis- 
tinct cell types within a subclass was similar 
across areas for inhibitory and non-neuronal 
subclasses but varied for excitatory neuron 
subclasses (Fig. IF). This within-subclass var- 
iation was not driven by differential sampling 
of nuclei across areas (Fig. 1, G and H). In V1, 
there were more layer (L)4 intratelencephalic 
projecting (IT) types, consistent with expan- 
sion and specialization of the thalamorecipient 
IA in V1, and more L5 IT types. L6 IT Car3 neu- 
rons were more diverse in MTG, Al, and AnG 
compared with other areas. L5 ET neurons were 
least diverse in the rostral area ACC and in the 
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Fig. 1. Transcriptomic cell type diversity across human cortical areas. 

(A) Eight areas of the neocortex were sampled from four lobes of the adult 
human brain. (B) snRNA-seq sampling across areas grouped by RNA-seq 
platform, layer dissection strategy, and number of male and female donors. 

(C) Schematic of snRNA-seg clustering to generate cell-type taxonomies for each 
area. (D) UMAPs of single nuclei from each area based on variable gene 
expression and colored by cell subclass as in (J). (E) Distributions of subclass 
transcriptomic entropy differ between neuronal (Exc and Inh) and non-neuronal 
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(NN) classes and not between areas. (F, G, and H) Summary of within-area 
taxonomies showing the number of nuclei sampled from each subclass and the 
number of distinct clusters (cell types) identified for excitatory (F) and inhibitory 
(G) neurons and non-neuronal cells (H). (1) Number of subclass markers in 

each area (box plots) and shared across areas (blue points). Box plots show median, 
interquartile range (IQR), up to 1.5*IQR (whiskers), and outliers (points). 

(J) Heatmaps of conserved marker expression for 50 random nuclei sampled from each 
area for chandelier interneurons and horizontally compressed for all subclasses. 
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most caudal area V1, whereas L6 corticothalamic- 
projecting (L6 CT) neurons were most diverse 
in M1, S1, MTG, and V1. Individual subclasses 
had hundreds of distinct markers in each area 
(table S4), and 20 to 70% of markers were 
conserved across areas (Fig. 11). For example, 
Fig. 1J plots expression of a set of chandelier 
cell markers that were common across areas 
(left), and a set of common markers for all 
subclasses (right). Excitatory subclasses had 
the smallest fraction of conserved markers, 
pointing to more variable expression of excit- 
atory neuron gene expression across the cortex 
as reported in mice (5). 


Cross-areal abundance changes reveal 
areal specification 


The areas analyzed have distinct cytoarchitec- 
ture based on Nissl staining that shows varia- 
tion in cell size, shape, and laminar organization 
(Fig. 2A), and spanned the rostrocaudal axis of 
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Fig. 2. Cell subclass composition reflects cytoarchitecture and varies 
systematically along the R-C axis. (A) Images of Nissl-stained sections of 
cortical areas are labeled with approximate layer boundaries and show distinct 
cytoarchitecture. Areas are ordered by position along the R-C axis of the cortex. 
(B) Representative cortical gyral locations of sampled tissue. (C) Relative 
proportions of neuronal subclasses as a fraction of all excitatory or inhibitory 
neurons in each area and estimated based on snRNA-seq profiling or in situ 
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the cortical sheet (Fig. 2B). Relative propor- 
tions of transcriptionally defined neuronal sub- 
classes varied across areas (Fig. 2C and table S5). 
Excitatory neuron subclasses had the greatest 
differences in proportions across areas and 
often reflected known differences in cellular 
architecture. Agranular M1 and ACC (Fig. 2A) 
had L4 IT neurons but at lower proportions 
than other areas (12), with the lowest proportion 
observed in ACC. By contrast, in V1 where L4 
is visibly enlarged, L4: IT neuron proportion was 
increased. As described previously in mouse 
cortex (5) and between human M1 and MTG 
(12), inhibitory neuron subclasses were similar 
across areas except for a marked increase of 
medial ganglionic eminence (MGE)-derived 
PVALB neurons and fewer caudal ganglionic 
eminence (CGE)-derived interneurons (LAMP5 
LHX6, LAMP5, SNCG, VIP, PAX6) in V1. These 
proportion differences in excitatory and inhib- 
itory neurons were validated in situ by label- 


ing of neuronal subclasses in MTG and VI using 
MERFISH spatial transcriptomics (Fig. 2C, 
right panels, and table $5), demonstrating that 
they were not an artifact of nuclear isolation 
or snRNA-seq processing. 

Subclass proportions were highly consistent 
across donors despite variation in the precise 
location sampled for areas such as MTG (Fig. 
2D). Examined from a subclass perspective, the 
most obvious proportion differences were seen 
in IA IT (range 10-fold, from 3 to 30% of ex- 
citatory neurons), and in the much sparser L5 
ET neurons (range 50-fold, from 0.1 to 5%). 
Many of these proportion differences varied in 
a graded fashion generally along the rostro- 
caudal (R-C) axis. Pairwise correlations in excit- 
atory neuron proportions revealed correlated 
R-C decreases in L5 ET, L6B, L6 IT, and L5/6 
near-projecting neurons (L5/6 NP), with an 
anticorrelated R-C increase in LA IT (Fig. 2E). 
Among inhibitory subclasses, the rarest types 
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labeling using MERFISH. Arrowhead directions indicate subclasses that 
significantly increase (pointing up) or decrease (pointing down) across areas 
based on scCODA analysis (D) For each donor, subclass proportions were 
calculated as a fraction of all neurons in the same class (excitatory or inhibitory) 
and grouped by neighborhood (*nominal P < 0.05; **Bonferroni-corrected 

P < 0.05). (E) Spearman correlations of excitatory and inhibitory subclass 
proportions across areas. Scale bar on (A) is 200 um. 
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(SNCG, PAX6, and SST CHODL) had the most 
correlated changes in proportions with a de- 
creasing R-C gradient. 

Smaller-scale areal specializations in pro- 
portions were overlaid on these broad trends 
of conservation or R-C gradients and many 
subclasses showed a particularly large differ- 
ence in V1 (L5/6 NP, L6B, PAX6). Chandelier 
inhibitory neuron proportions were lowest 
in MTG and AnG and highest in S1 (Fig. 2, C 
and D). There were more L4: IT neurons and 
fewer L5 ET neurons in DFC, more L6 IT neu- 
rons in M1, and fewer L6 IT Car3 neurons in 


ACC than expected based on the broad trends. 
In summary, cell subclass proportions define a 
quantitative cytoarchitecture that is canonical 
in having all 24 subclasses in all areas, with 
varying proportions and gradient properties 
that likely reflect developmental gradients and 
specializations driven by the circuit require- 
ments of functionally distinct cortical areas. 


Excitatory to inhibitory neuron ratio varies 
across cortical areas and layers 


In addition to areal specializations in neuronal 
subclass proportions, we found differences in 
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relative proportions of excitatory and inhibitory 
neurons (E:I ratio) (table S5). As previously re- 
ported for M1 (72), the E:I ratio was 2:1 for most 
cortical areas, in contrast to the reported E:I 
ratio of 5:1 in mice. However, the E:I ratio in V1 
(4.5:1) was much higher (Fig. 3A) and compa- 
rable to that of rodents. MERFISH analysis in 
MTG and V1 confirmed these values and areal 
differences (table S5). 

Layer-specific dissections of nuclei from 7 re- 
gions (excluding AnG) allowed a deeper ex- 
ploration of E:I ratio variation. E:I ratios varied 
by area and layer and were consistent across 
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Fig. 3. E:I ratio variation across cortical areas and layers. (A) Relative 
number of excitatory neurons to inhibitory neurons (E:I ratio) in each area. Bar 
plots indicate average and standard deviation across donors. (B) E:! ratios 
estimated for a common set of layers dissected from each area. Box plots show 
median, interquartile range (IQR), up to 1.5*IQR (whiskers), and outliers (points) 
across multiple donors. (C) Validation of increased E:I ratios in all cortical 
layers in V1 compared with MTG based on MERFISH experiments. Bar plots and 
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whiskers indicate average and standard deviation of E:| ratios across donors, 
respectively. (D) E:I ratios estimated for all layers dissected from each area. 
(E) Laminar distributions of interneurons were conserved (SNCG) or divergent 
(LAMP5 LHX6) across areas based on counts of layer-dissected nuclei. Note 
that primary sensory areas (S1, Al, and V1) have a distinct distribution of LAMP5 
LHX6 neurons. (F) MERFISH in situ labeling of LAMP5 LHX6 cells shows a 
decreased proportion of cells in layer 6 of V1 compared with MTG. 
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donors (Fig. 3B). However, increased variability 
was seen in MTG perhaps due to sampling of 
both the intermediate and caudal subdivisions 
of A21. V1 had the highest E:I ratio across all 
layers, not just in L4 (7:1) but also in L5 and 
L6, where the highest E:I ratio of 10:1 was seen. 
Moreover, there was a monotonic increase in 
the E:I ratio of other areas along a R-C gradi- 
ent, which was most apparent in L2/3. E:I 
ratios were more variable in LA and L5, masking 
the trend in overall E:I ratios (Fig. 3A). In situ 
cell counts in MTG and V1 using MERFISH 
confirmed a higher E:I ratio in all layers of V1 
(Fig. 3C). From a within-area perspective, E:I 
ratios increased with cortical depth, with the 
highest ratios in L6 for all areas (Fig. 3D). 
Furthermore, the E:I ratio in LA was distinctly 
elevated in V1 relative to L2, L3, and L5, high- 
lighting specialization of visual processing com- 
pared with other sensory modalities. Finally, 
laminar distributions of excitatory and inhib- 
itory neurons were relatively consistent across 
cortical areas (fig. S9), such as SNCG in LI (Fig. 
3E), but some areal and laminar variation was 
apparent, such as LAMP5 LHX6 proportions 
in L6 (Fig. 3, E and F). Taken together, E:I 
ratios vary extensively both by layer and area, 
with markedly different ratios in V1 and areal 
variation that is masked by averaging across 
cortical layers. 


Transcriptomic cellular topography 


To characterize the transcriptomic landscape of 
neuronal subclass cortical areas, neuronal nu- 
clei were integrated by donor for each of four 
neighborhoods [IT-projecting excitatory, deep 
layer (non-IT) excitatory, MGE-derived GABAergic 
and CGE-derived GABAergic] and visualized as 
UMAPs colored by subclass (Fig. 4A) and area 
(Fig. 4B). Three organizational principles were 
apparent. First, excitatory neurons had strong 
areal signatures, visualized as clear banding by 
area, whereas inhibitory neurons were mostly 
intermixed across areas similar to reports in 
the mouse cortex (15). Second, there were visi- 
ble V1 specializations including substantial ex- 
pansion of L4 IT neurons and distinct islands 
in the UMAPs for most IT-projecting subclasses 
and L6 CT neurons. Distinct V1 islands were 
also seen for parts of the PVALB and SST sub- 
classes (arrows in MGE-derived UMAPs). Third, 
the areal similarity of excitatory neurons ap- 
peared to vary in a R-C topographic order for 
many subclasses, similar to prior reports of 
gene expression similarity across the human 
cortex (22). Neighboring areas were similar and 
intermixed despite Known functional distinc- 
tiveness; for example, nuclei from M1 and S1 
intermingled despite their specificity for mo- 
tor and somatosensory functions, respectively. 

Areal variation in gene expression mirrored 
the UMAP trends. The number of differentially 
expressed genes (DEGs, table S6) across areas 
was largest for excitatory neurons (Fig. 4C), and 
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highest for L4 IT and L5 ET subclasses (over 
1000 DEGs). DEGs for inhibitory neuron sub- 
classes varied widely, from over 100 DEGs for 
SST and PVALB interneurons to fewer than 
10 DEGs for SNCG and SST CHODL and a 
single DEG (ADAMTS9-AS2) for PAX6. Non- 
neuronal cell subclasses similarly displayed 
few areal DEGs. We next used a previously de- 
fined tau score (23) to identify area-specific mark- 
ers (table S7), which were much less common. 
Excitatory neurons expressed the most areal 
markers and ACC and V1 were the most distinct 
areas (Fig. 4D and fig. S10, A and B). IT-projecting 
neurons were specialized in both ACC and V1 
whereas Non-IT L6 CT and L5 ET neurons 
were specialized mostly in V1. 

The topographic ordering of the excitatory 
neuron subclasses above suggested graded 
changes as a function of distance, similar to 
bulk tissue profiling studies reporting gradual 
changes in gene expression across the cortical 
sheet (22). We therefore calculated transcrip- 
tomic similarities of excitatory subclasses as a 
function of the approximate physical distance 
between pairs of areas on an unfolded cortical 
sheet (Fig. 4E and table S2). Because V1 was so 
distinct (Fig. 4A), we fit two linear models of 
subclass similarity versus areal distance, one 
that included pairwise comparisons to V1 and 
one that did not (Fig. 4E). All excitatory neuron 
subclasses showed the same monotonic de- 
crease of similarity with distance but had differ- 
ent amounts of transcriptomic specialization 
in V1 (intercepts, but not slopes, are different 
in Fig. 4E). Interneuron similarity also decreased 
with distance at the same rate for all subclasses, 
albeit at about 40% the rate of excitatory neu- 
rons, and with much less specialization in V1 
(fig. S1IOC). By contrast, non-neuronal expres- 
sion did not change systematically with inter- 
areal distance and was not more specialized 
in V1 (fig. S1OD). 

To determine how gene expression varied 
across the cortical sheet, we performed a var- 
lance partitioning analysis for each subclass 
(fig. SIOE and table S8). Expression of hun- 
dreds of genes was explained by area identity 
and spatial gradients for excitatory neurons 
compared with a few genes for inhibitory neu- 
rons and non-neuronal cells. These genes had 
a Similar proportion of expression variation 
explained by area or gradients (median 5 to 
10%) in all subclasses (fig. S1OF). Therefore, 
the observed differences in transcriptomic to- 
pography (Fig. 4E) were mainly due to the 
number of genes with areal variation and not 
to the strength of that variation. Among IT- 
projecting neurons, some genes showed distinct 
patterning in a single subclass whereas other 
genes were topographically patterned in all 
IT subclasses (fig. S1OG). We calculated the 
expression variance explained by gradients 
along three axes: rostrocaudal (R-C), midline- 
surface (M-S, anatomical left to right), and 
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dorsoventral (D-V). The relative position of each 
cortical area along these axes was calculated 
using the voxel coordinates corresponding 
to the approximate center of each area in the 
Allen Human 3D Reference Atlas (table S2). 
For genes with at least 5% of expression var- 
iance explained by any gradient, we quantified 
the relative strength of gradients based on the 
relative proportion of expression variance that 
was explained (fig. SIOH). For most subclasses, 
R-C gradients were dominant but non-IT sub- 
classes also expressed many genes with M-S 
and D-V gradients (Fig. 4F and fig. SIOH). 

A set of R-C genes was defined for each sub- 
class by requiring a Spearman correlation >0.7 
between expression and areal position along 
the R-C axis and a correlation >0.5 after ex- 
cluding V1 and ACC as these are transcrip- 
tionally distinct regions that might bias the 
identification of gradients. For the most vary- 
ing L4 IT and L5 ET neurons, roughly equal 
numbers of genes increased and decreased ex- 
pression rostrocaudally (Fig. 4G). For other 
subclasses, many more genes increased rather 
than decreased expression along the R-C axis. 
The correlations of R-C genes were greater than 
correlations to a randomly shuffled ordering 
of areas for most neuronal subclasses (fig. S101). 
Genes with a R-C gradient in one subclass fre- 
quently had a gradient in the same direction 
in other subclasses that expressed the gene 
(fig. S10J), such as CBLN2 in L2/3 IT and L4. IT 
neurons (Fig. 4H), which is expressed in a 
similar gradient in maturing cortical neurons 
during human prenatal development (24). How- 
ever, some genes such as DCC had opposing 
gradients in different subclasses (L5/6 NP and 
VIP), and some functionally related genes had 
opposing gradients in the same subclass, such 
as the cell adhesion molecules Contactin 5 and 
6 (CNTN5 and CNTN6) in L5 IT neurons (Fig. 
4H). Based on gene ontology (GO) analysis, 
genes with strong areal enrichment or R-C 
gradients included voltage-gated potassium 
and calcium channels (table S7). Notably, only 
R-C genes were associated with axon guidance 
pathways including SLIT/ROBO, ephrin, and 
semaphorin signaling molecules (table S7), 
likely reflecting developmental patterning of 
connectivity. 


Cross-areal consensus taxonomy 


Next, we sought to understand finer cell-type 
areal variation by clustering integrated cell 
neighborhoods (Fig. 4) to identify a set of cell 
types either common to or varying across cor- 
tical areas (Fig. 5A). We defined and organized 
153 cell types by transcriptomic similarity into 
a consensus taxonomy (Fig. 5B). Consensus 
cell types had consistent markers across areas 
(table S9), were represented in all donors (Fig. 
5C), and ranged from 0.01 to 20% of excitatory 
and inhibitory neurons and from 0.1 to 30% of 
non-neuronal cells (Fig. 5D). Most types were 
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Fig. 4. Transcriptional topography across cortical areas. (A and B) UMAPs 
showing transcriptomic similarities of single nuclei dissected from eight cortical 
areas and colored by neuronal subclass (A) and area (B) for excitatory and 
inhibitory neuron neighborhoods. Arrows indicate V1-specialized neurons. Curved 
arrows illustrate R-C ordering of areas on the cortical sheet. (C) The number of 
genes that are significantly differentially expressed across areas for each 
subclass grouped by neighborhood. Subclasses with O or 1 DEG are labeled. See 
table S6 for all DEGs. (D) The number of genes that have highly enriched 
expression in a single area for each subclass. (E) Spearman correlations of 
expression similarity between pairs of areas as a function of the approximate 
physical distance along an unfolded neocortical sheet. Pairwise comparisons that 
include V1 (blue points) or do not include V1 (red) are grouped separately 


found in all areas, with generally uniform rep- 
resentation across areas for inhibitory neu- 
ron types and non-neuronal cells (Fig. 5E). 
However, we also identified area-enriched or 
area-specific cell types, particularly in V1 (dark 
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Number of genes 


blue). Vi1-enriched clusters were seen in most 
excitatory subclasses, particularly L4 IT, as 
well as SST and several PVALB and VIP types. 
There was also one ACC-selective VIP type. Sim- 
ilarity by proximity was evidenced by cross- 
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because V1 is so transcriptomically distinct. (F) Ternary plot summarizing the 
relative proportion of variance explained by expression gradients across areas 
along R-C, M-S (anatomical left to right), and D-V axes for each subclass. 
Point size indicates the number of genes with >5% of variance explained by at 
least one gradient, and point location shows the weighted mean proportion 
across all genes (shown in fig. SIOH). Points are colored by cell neighborhood. 
(G) For each subclass, the number of genes with expression that increases 
(R-C) or decreases (C-R) in areas ordered by their position along the R-C axis. 
(H) Examples of genes with R-C gradient expression that have been previously 
described in development (CBLN2) (32), have opposing gradients in different 
subclasses for the same gene (DCC), or for two related genes (CNTN5 and 
CNTN6) involved in neuronal connectivity for the same subclass. 


areal excitatory cell types common to neigh- 
boring regions (M1 and S1, MTG and AnG). 
To study changes in the relative abundances 
of cell types while accounting for the composi- 
tional nature of the snRNA-seq data, we applied 
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Fig. 5. Cross-areal consensus taxonomy. (A) Schematic of data integration 
across donors used for each neighborhood to generate the cross-area consensus 
taxonomy. (B) Consensus taxonomy of cell types across eight areas. (C) Proportion 
of nuclei in each consensus type dissected from each donor. (D) Consensus type 
proportion including nuclei from all areas as a fraction of cell class. Individual 


dots indicate proportions measured per donor. (E) The relative number of nuclei ‘ 
dissected from areas that contribute to each consensus cell type. (F) Changes 

in consensus type proportions across areas based on compositional analyses 

of neurons and non-neuronal cells using scCODA. Larger magnitude changes 

are indicated by darker colors. See table S10 for proportion effect sizes. 


a Bayesian model (scCODA) (25). Nuclei were 
grouped by consensus types and iteratively 
tested for consistent differences using each 
type as the “unchanged” reference population. 
All subclasses included consensus types with 
both increased and decreased proportions (table 
S10), except for PAX6 inhibitory type abun- 
dances that were uniformly decreased in V1. V1 
had the most consensus types with abundance 
changes (92 of 153, 60%), including two types 
with the largest changes (L4 IT_5 and L2/3 
IT_2). Excitatory neurons were the most spe- 
cialized in V1, but several SST, PVALB, and VIP 
consensus types were also specific to V1. Spe- 
cialized types were also found in other areas, 
including L2/3 IT and L5/6 NP excitatory 
types (L2/3 IT_3, L2/3 IT_4, L5/6 NP_3 and 
L5/6 NP_6) in M1 and S1, SST types (SST_4 
and SST_10) in ACC, and distinct L5 ET types 
across the R-C axis. These substantial changes, 
along with more subtle abundance changes 
(median = 17 consensus types affected in each 
area), are likely important determinants of the 
functional role of each area. 


V1 specializations 


The distinctiveness of V1 was reflected in the 
transcriptomic data for specific cell types. Con- 
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sidering cell types with >60% membership in V1 
compared with other areas to be V1-specialized, 
we identified specialized cell types in every 
excitatory subclass except L5/6 NP, with the 
greatest number of V1-specialized types in 
the L2/3 IT and L4 IT subclasses (Fig. 6A and 
table S11). Unexpectedly, given prior reports of 
common GABAergic neurons across the mouse 
neocortex (15, 17), V1 had a number of special- 
ized CGE- and MGE-derived types. 
MERFISH analysis of V1 demonstrated the 
spatial organization of all cell types (fig. S11, A 
and B). L2/3 IT types had distinct markers (table 
S12), sublaminar distributions, and relative pro- 
portions (Fig. 6B). L2/3 IT5 and L2/3 IT2 clearly 
delineated L2 and L3 from one another, re- 
spectively. Other L2/3 IT types were more 
sparsely distributed in L2 (L2/3 IT4), L3 (L2/3 
IT3), or both (L2/3 IT1 and 6). L2/3 IT types 
were also found in layer 4A (L2/3 IT2) and the 
superficial part of layer 4B (L2/3 IT3), and 
these types were V1-specialized. Conversely, 
several L4 IT types were found in L4A and L4B 
and into the deep part of L3 (L4 IT1 and 3, Fig. 
6D). Thus, the specialized L4A and LAB con- 
tain not only LA: IT-type neurons, but also L2/3 
T-type neurons. This finding may help resolve 
ongoing questions about primate V1 L4A and 
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LAB, which contain both stellate (L4 IT-like) 
and pyramidal corticocortical projection neu- 
rons (L2/3 IT-like) (26). 

14 in V1 is highly distinctive even in unlabeled 
tissues as a result of the band of myelinated | 
thalamocortical axons entering L4: that form , 
the stria of Gennari. This distinctiveness was 
also seen at the level of L4 IT neuron types, all 
but one of which were V1-specialized (Fig. 6, C 
and D). L4 IT types had specific markers (table 
S12) and sublaminar distributions, from dense 
pan-LAé (L4 IT3) to sublayer-specific distribu- 
tions. Layers 4Co, and 4C8 receive selective in- 
puts from magnocellular and parvocellular 
layers of the thalamic lateral geniculate nu- 
cleus, respectively (27), and MERFISH revealed 
localization of specific types to each sublayer. 
LA IT5 was selectively localized in 4CB, where- 
as L4 IT2 was enriched in 4Co, but extended 
into LAB, consistent with the fuzzy boundary 
between 4Ca and 4B described in other hu- 
man studies (28). Sparser L4 IT types were 
scattered across layers. Together, these results 
illustrate the cellular specialization of the dis- 
tinctive input layer of V1 and reveal a com- 
plexity of putative thalamorecipient stellate 
neurons that offers many avenues for future 
exploration. 
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Fig. 6. V1 cell-type specialization. (A) Transcriptional distinctiveness of cell 
types in the V1 taxonomy. Cell types with specificity >0.6 are considered V1- 
specialized and are highlighted in blue (see table S11). (B) Laminar distributions of 
specialized (blue text) and common (gray) L2/3 IT types based on MERFISH in situ 


L6 CT neurons that send reciprocal projec- 
tions to the LGN were also highly specialized 
in V1 (Fig. 6A), with two distinct types that 
expressed many V1-enriched genes (fig. S11C). 
Gene set enrichment analysis showed enrich- 
ment for calcium signaling, axon guidance, and 
axonal and synaptic compartments, including 
axon guidance molecules CDH7, EPHAG, and 
SEMAGA (fig. S1LD). Various ion channels (KCNT2 
and SCN1B) and synaptic genes (SY76), as well 
as calcium and calmodulin signaling-associated 
genes (PCP4, NPY2R), were similarly enriched, 
and several of these have conserved V1 enrich- 
ment in monkeys (29). Myelin basic protein 
(MBP), normally described in oligodendrocytes 
but known to function in certain neurons as 
part of a Golli-MBP complex (30), was also en- 
riched in V1 L6 CT neurons. 

V1 also contained specialized GABAergic in- 
terneuron types. Most were SST types (Fig. 6, 
E and F, and table S12), as well as one PVALB 
and two VIP types (fig. S11A). The SST types 
common across V1 and other cortical areas 
were concentrated in L2 with sparser repre- 
sentation in other layers. By contrast, the V1- 
specialized SST types were concentrated in L4: 
near the V1-specialized L4 IT types, suggestive 
of a relationship between these specialized 
excitatory and inhibitory types. 
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L5 ET neuron diversity 

Neocortical L5 ET neurons are sparse and cap- 
turing them required additional L5-specific 
sampling. L5 ET neurons were most abun- 
dant in ACC and their abundance generally 
decreased along the R-C axis (Fig. 2D). V1 had 
the lowest proportion of L5 ET neurons (~0.1% 
of excitatory neurons), consistent with data 
from macaque monkeys demonstrating pro- 
jections to subcortical targets such as the su- 
perior colliculus from large, very sparse neurons 
(Meynert cells) localized to deep layers in V1 
(fig. SI2A) (37-33). 

We identified 4 consensus L5 ET types (Fig. 5 
and Fig. 7A), several of which were dominated 
by nuclei derived from cortical areas near each 
other. M1 and S1 predominantly contributed 
to L5 ET 1, whereas L5 ET 3 was largely com- 
posed of nuclei from MTG and AI (and to a 
lesser extent AnG), again suggesting similarity 
based on topographic position. V1 specializa- 
tion was also apparent in L5 ET types, with 
only asingle type (L5 ET 4) consisting of nuclei 
predominantly from V1 (Fig. 7, A and B). L5 ET 
neurons could be divided into at least two 
transcriptomically distinct subtypes in most 
regions (Fig. 7B). M1 had 3 distinct subtypes 
and we showed previously (12) that at least 
two L5 ET M1 subtypes included Betz cells. 
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labeling experiments. (C and E) Scaled expression of marker genes of V1 specialized . 
(blue labels) and common (black) L4 IT (C) and SST (E) types across areas. 
Dendrograms were pruned from the V1 taxonomy in (A). (D and F) Laminar distributions 

of specialized and common L4 IT (D) and SST (F) types based on MERFISH experiments. 


Notably, despite having the highest propor- 
tion of L5 ET neurons of all areas, only one 
subtype was identified in ACC, implying that 
VENs in ACC likely do not represent a distinct 
transcriptomic cluster, consistent with our pre- 
vious findings in frontoinsula (34). ; 
L5 ET neurons had more genes with variable , 
expression across areas than any other cell type 
(fig. SIOF). Up to 32% of variation in gene ex- 
pression across areas was explained by gra- 
dients along the M-S (anatomical left to right), 
D-V, and R-C axes (Fig. 7C). Top gradient genes 
included a glutamate receptor subunit (GRID2), 
a semaphorin (SEMA3D), and a neuropilin 
(NRPYI) that are involved in trans-synaptic sig- 
naling and connectivity (Fig. 7C). Some gene 
expression varied across areas but not as a 
gradient, such as DGKB, which was selectively 
down-regulated in primary sensory areas (A1, 
S1, and V1). L5 ET neurons also expressed dis- 
tinct areal markers (Fig. 7D and table S13), in- 
cluding the voltage-gated potassium channel 
KCNG2 in ACC, glutamate receptor subunit 
GRIK1 in MTG and AnG, and ANK1 in VI, a 
gene that encodes for the scaffolding protein 
Ankyrin 1 (35). Applying GO enrichment anal- 
ysis to L5 ET areal markers identified enriched 
pathways associated with synaptic signal- 
ing, connectivity, and intrinsic neuronal firing 
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For the four types of areal variation, the distribution of expression across 

areas is shown for one of the top five genes. (D) Examples of genes with L5 ET 
neuron expression restricted to one or a few areas. (E) Number of genes in 
the top 10 significantly enriched terms from gene ontology (GO) analyses 
(biological process, BP; cellular component, CC; molecular function, MF) of Ld ET 
areal markers (table S13). 


Fig. 7. L5 ET-projecting neuronal diversity. (A) UMAPs of L5 ET neurons 
labeled by area and cross-area consensus type. (B) Within-area L5 ET 
subtypes for each area shown in the same UMAP space as (A). (C) Bar plots 
summarizing the expression variance explained by human donor, L5 ET subtype, 
and four types of variation across areas: R-C, M-S (anatomical left to right), 
and D-V gradients and more complex patterns or in a single area (area). 


properties (Fig. 7E), consistent with known 
areal variation in firing properties. 


Glial specialization 


Non-neuronal cells comprised at least 40 to 
65% of cortical cells across areas based on flow 
cytometric analysis of dissociated nuclei la- 
beled with the neuronal marker NeuN and 
gated based on NeuN fluorescence intensity 
(fig. SI2A). However, these proportions under- 
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estimate the total non-neuronal population 
because vascular cells, including endothelial 
cells and VLMCs, are difficult to dissociate 
(36) and are under sampled in the snRNA-seq 
dataset based on in situ labeling with MER- 
FISH (fig. S12B) (37). M1 and S1 had a higher 
proportion of non-neuronal (NeuN-negative) 
cells than other areas, and snRNA-seq data 
showed that this was driven by an expansion 
of oligodendrocytes relative to OPCs, astro- 
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cytes, and microglia (fig. S12, B, C, and D), con- 
sistent with neuroimaging studies showing 
that these areas are the most heavily myeli- 
nated in the cortex (38, 39). By contrast, areas 
described to be among the most lightly mye- 
linated in the cortex (ACC and DFC) (38) had 
the lowest proportion of oligodendrocytes (fig. 
S12, B and C). 

Non-neuronal cells were grouped into ma- 
jor subclasses based on conserved marker 
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expression (fig. S12F), and many subclasses 
could be further divided into distinct subtypes. 
Astrocytes were subdivided into previously de- 
scribed protoplasmic, interlaminar (ILM), and 
fibrous types, which also had robust markers 
across areas (fig. S12G). Consistent with pre- 
vious reports of shared non-neuronal types 
across the cortex (15, 17), there was little areal 
expression signature for most non-neuronal 
cell types (Fig. 8A and fig. S12E). However, 
areal variation in protoplasmic—but not ILM 
or fibrous astrocytes—was apparent, consistent 
with previous descriptions of brain-wide as- 
trocyte heterogeneity (40, 41) and variation 
in astrocytes across cortical and hippocampal 
areas in mice (42). Protoplasmic astrocytes 
from ACC showed clear banding on the UMAP 
(Fig. 8A) and distinct areal marker gene ex- 
pression (Fig. 8B). 

Laminar distributions varied across areas 
for all glial subclasses (Fig. 8C and fig. S12H). 
There was a notable depletion of astrocytes in 
LAA and LAB of V1 but not in L4 of other sen- 
sory or granular cortices (Fig. 8C). To validate 
this finding, we compared in situ expression of 
the astrocyte marker GFAP in V1 and DFC. 
GFAP protein and gene expression was re- 
duced in LAB of V1 (Fig. 8D), and only protein 
expression was reduced in L4 of DFC (fig. $121) 
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Fig. 8. Areal specialization of astrocytes. (A) UMAP of non-neuronal cells 
labeled by cortical area. (B) UMAPs of astrocyte expression for genes with areal 
enrichment. Arrows in (A and B) shows grouping of nuclei from ACC on the UMAP. 
(C) Laminar distributions of astrocytes vary across areas and are depleted in V1 L4A 
and LAB. (D) GFAP IF and ISH illustrate variable laminar distributions and 
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based on immunofluorescence (IF) and in situ 
hybridization (ISH) labeling of adult human 
tissue. In V1, a band of dense GFAP labeling 
was apparent in L6A and L6B, which tapered 
off in the underlying white matter. GFAP IF in 
V1 revealed a population of astrocytes that ex- 
tended long processes away from the white 
matter and into L5, similar to descriptions of 
varicose projection astrocytes (VPA) that are 
distinctive to humans and great apes and not 
found in the cortex of other anthropoid pri- 
mates (43, 44) (Fig. 8D and S121). Deep-layer 
astrocytes in DFC did not extend long pro- 
cesses and had morphologies typical of proto- 
plasmic and fibrous astrocytes (fig. S12]). 

The spatial organization of astrocytes in V1 
was further investigated using MERFISH (fig. 
S8). Based on laminar distributions (Fig. 8E) 
and marker gene expression (Fig. 8F), there 
were two subtypes of protoplasmic (Astro_1 
and Astro_3) and ILM (Astro_2 and Astro_5) 
astrocytes and one fibrous subtype (Astro_4). 
In contrast to prior descriptions of protoplas- 
mic astrocytes as relatively homogenous cells, 
protoplasmic subtypes in V1 displayed dis- 
tinct laminar patterns with Astro_1 localized 
to the sublayers of L4 and Astro_3 spread 
across L2-6 but absent in L1, L6B, and white 
matter. Astro_1 markers were related to en- 
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ergy metabolism, including mitochondrial genes 
COX] (Fig. 8F), COX2, and COX3, and Astro_1 
cells may represent highly active protoplasmic 
astrocytes (an Astro_3 cell state) rather than a 
developmentally distinct type. Astro_5 cells 
were mostly restricted to the L1-pial border, 
whereas Astro_2 cells were enriched in the 
deeper part of L1. These subtypes likely rep- 
resent pial and subpial ILMs (45, 46), respec- 
tively. The putative subpial ILM type (Astro_2) 
included a small number of cells localized to 
deep L6. Because ILMs and VPAs have previ- 
ously been shown to express shared marker 
genes (AQP4 and CRYAB, Fig. 8F) and have 
similar morphologies (44, 45), these deep-layer 
Astro_2 cells may represent a type of VPA. 
However, further work will be needed to fully 
characterize the diversity of astrocyte mor- 
phologies across the cortex and their relation- 
ships to transcriptomic astrocyte types. 


Discussion 


The cellular complexity of the cortex has chal- 
lenged generations of neuroscientists aiming 
to understand the structural basis of cogni- 
tive function. The BRAIN Initiative Cell Census 
Network established a paradigm for mapping 
cortical cellular diversity, developed methods 
to work across species, and established the 
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morphologies of astrocytes in V1 and validates depletion in L4A and L4B. 
Single channel IF images were inverted to increase visibility of GFAP IF. Scale 
bars: IF columns (100 um), GFAP tracing images (15 um), ISH (200 um). 

(E) Laminar distributions of astrocyte subtypes in V1 based on MERFISH in situ 
labeling experiments. (F) Pan-astrocyte and subtype marker expression. 
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concordance of a transcriptomic cellular clas- 
sification with other cellular properties in a 
way that integrates prior literature while iden- 
tifying greater cellular diversity than previous- 
ly appreciated (12, 13, 16, 18, 19, 47). We used 
these principles to analyze a series of human 
cortical areas, building on our highly anno- 
tated M1 taxonomy (12). Because the cortex 
has a common organization as well as graded 
changes and areal specializations, we applied 
two complementary strategies to define cell 
types. First, each area was analyzed indepen- 
dently, transferring labels from the M1 taxon- 
omy to other areas, which provides the highest 
resolution clustering in each area and identi- 
fies a common cell subclass organization. Next, 
data from all areas was analyzed jointly, iden- 
tifying a set of consensus clusters present in 
multiple areas while also capturing specialized 
cell types distinct to a single area. Similar joint 
analysis strategies have been used on mouse 
cortex with comparable results (17). 

A key finding of this study is that all 24 sub- 
classes first identified in M1 are found in all 
cortical areas analyzed here, substantiating the 
idea that there is a common cellular organiza- 
tion across the cortex. This was true even for LA: 
IT neurons, which were present in agranular 
ACC and M1 (22, 48-50). Each cortical area 
analyzed could be defined as a distinct propor- 
tional makeup of cell subclasses. Proportional 
differences were mostly due to variation in 
excitatory neuron subclasses, which could be 
large (10- to 50-fold). Finer cell-type analysis 
demonstrated substantial areal variation where 
distant areas had distinct gene expression and 
some cell types clustered separately. Thus, both 
a canonical and a noncanonical architecture 
were apparent, depending on the granularity 
of cellular detail analyzed. 

Topographic variation as a function of R-C 
position was a clear organizational feature. Prior 
microatray-based analysis of human (22) and 
macaque (29) cortex showed that molecular 
similarity varies as a function of distance on 
the cortical sheet that likely mirrors early de- 
velopmental gradients (5/7, 52). Here we see 
comparable variation by R-C position and 
similarity as a function of distance, predomi- 
nantly in select cell types. As in the mouse cortex 
(15), areal variation was mostly in excitatory 
and not inhibitory neurons (except in V1). These 
results are consistent with the fact that most 
inhibitory neurons migrate from the gangli- 
onic eminences and are relatively homoge- 
neous across the cortex, whereas excitatory 
neurons are generated from progenitor cells 
with developmental gradients that are main- 
tained in postmitotic neurons. R-C variation 
was seen not just in gene expression but also in 
excitatory neuron proportions. L4 IT neuron 
proportions increased from rostral to caudal, 
whereas L5 ET neurons proportions showed 
negative correlation, suggesting that develop- 
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mental gradients likely sculpt the cortical cel- 
lular makeup. Similar R-C variation in cellular 
morphology in macaque monkeys supports 
this idea (53). 

The molecular and cell-type distinctiveness 
of V1 in the present study mirrors the special- 
ized cytoarchitecture of V1 in humans, primates, 
and other binocular mammals, unlike in mice 
in which V1 is similar to other cortical areas (17). 
V1 was more molecularly distinct than expected 
by topographic position, consistent with previ- 
ous bulk microarray analysis (22) and expan- 
sion of thalamorecipient L4 was reflected in 
increased L4: IT proportions. There has been a 
long-standing debate about the cellular make- 
up of L4A and I4B in V1, which have alter- 
natively been called 3BP and 3C, respectively 
(26, 54). Our results show that L4A and L4B 
contain both L2/3 IT and L4 IT neuron types, 
providing a potential explanation for this con- 
fusion. Additional cellular diversity that does 
not strictly obey laminar boundaries compli- 
cates this organization, similar to previous 
work showing lack of strict laminar organi- 
zation in human MTG (JJ). 

The balance of excitation and inhibition is 
thought to be critical to proper balance of neu- 
ronal circuitry (55). E:I ratios of ~4:1 in the 
human frontal cortex (56) and 4:1 in monkey 
V1 (57, 58) have been reported based on GABA 
immunohistochemistry. Transcriptomic quan- 
tification of cell proportions indicates a 5:1 E:I 
ratio in mouse cortex (72) versus a 2:1 ratio in 
human MTG and M1, which was confirmed by 
MERFISH here and in (37), and by electron 
microscopy analysis of mouse and human L2 
(59). We find that the human E:I ratio of 2:1 is 
consistent across all areas except V1 in which 
the ratio is 4.5:1, likely as a result of increased 
14 IT neurons in V1. However, the E:I ratio 
varies substantially by layer and is as high as 
10:1 in L6 of V1. Whether this variation can be 
compensated by homeostatic processes re- 
mains to be studied, but these results indicate 
that the E:I ratio can vary substantially in the 
human cortex. 

The current results illustrate the potential of 
single cell transcriptomics to provide a com- 
prehensive cellular map of the cortex that can 
be thought of as a form of quantitative cyto- 
architectonics based on the genes that give the 
cell types their properties. These analyses place 
a cellular lens on thinking about cortical areal 
variation as variation in the proportions and 
properties of the component cell types that de- 
fine the input-output properties of those areas. 
Recent studies have shown that morphological 
and anatomical characteristics are correlated 
with transcriptomic identity (13, 47, 60), indi- 
cating that transcriptomic maps can also be 
highly predictive of cell phenotype variation. 
The present study sampled a small number of 
human tissue donors and further work will be 
required to understand variation of cortical 
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gene expression and cell types across diverse 
individuals. Another future challenge will be 
creation of a multimodal map inclusive of the 
entire human neocortex where areal sampling 
is guided by detailed anatomical and func- 
tional parcellations that will reveal graded fea- 
tures versus discrete boundaries and enable 
direct linkage between the cellular and func- 
tional architecture of the cortex. 


Materials and Methods 
Postmortem tissue donors 


Males and females 18 to 68 years of age with 
no known history of neuropsychiatric or neu- 
rological conditions, evidence of head trauma, 
intubation, or neuropathology were consid- 
ered for inclusion in this study. De-identified 
postmortem human brain tissue was collected 
after obtaining permission from the decedent’s 
legal next-of-kin. Tissue collection was per- 


formed in accordance with the provisions of ~ 


the US Uniform Anatomical Gift Act of 2006 
described in the California Health and Safety 
Code section 7150 (effective 1/1/2008) and 
other applicable state and federal laws and 
regulations. The Western Institutional Review 
Board (WIRB) reviewed the use of de-identified 
postmortem brain tissue for research purposes 
and determined that, in accordance with fed- 
eral regulation 45 CFR 46 and associated guid- 
ance, the use of de-identified specimens from 
deceased individuals did not constitute human 
subjects research requiring IRB review. Rou- 
tine serological screening for infectious disease 
(HIV, Hepatitis B, and Hepatitis C) was con- 
ducted where possible using donor blood sam- 
ples and donors negative for these infectious 
diseases were considered for inclusion in the 
study. Tissue RNA quality was assessed using 
samples of total RNA derived from the frontal 
and occipital poles of each donor brain which 
were processed on an Agilent 2100 Bioanalyzer 
using the RNA 6000 Nano kit to generate RNA 
Integrity Number (RIN) scores for each sam- 
ple. Specimens with average RIN values =7.0 
were considered for inclusion in the study. 
Tissue samples from five individuals (3 males, 
2 females, mean postmortem interval 12.8 hours, 
mean age 47 years, table S2) were used for 
snRNA-seq data generation. Tissue samples 
from 3 individuals (3 males, table S2) were used 
for MERFISH data generation. 


Processing of postmortem brain specimens 


Postmortem brain specimens were trans- 
ported to the Allen Institute or the University 
of Washington on ice and processed as pre- 
viously described (https://dx.doi.org/10.17504/ 
protocols.io.bf4ajqse). Briefly, brain specimens 
were bisected through the midline and indi- 
vidual hemispheres were embedded in Cavex 
Impressional Alginate for slabbing. Coronal 
brain slabs were cut at 1 cm intervals for all 
donors except H20.30.002 (table S2), which 
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was processed at a slab interval of 4mm. Tis- 
sue photographs were acquired for all slabs 
prior to freezing. Individual slabs were frozen 
in a Slurry of dry ice and isopentane. Frozen 
slabs were vacuum sealed and stored at —80°C 
until the time of use. 


Tissue mapping and dissection 


Cortical areas of interest were identified on 
tissue slab photographs taken at the time of 
autopsy and at the time of dissection. Tissue 
samples used for Cv3 and SSv4 data genera- 
tion were on average 3mm wide, encompassed 
the full height of the cortical depth from pia to 
white matter of the sampled area (~5mm), and 
were Icm in thickness. Tissue photographs were 
used to map the tissue blocks sampled for Cv3 
data generation across donors and areas to 
several reference atlases (table S2). First, sam- 
ples were pinned to the Allen Human Refer- 
ence Atlas 3D in MNI volume space, which 
includes labeling of 141 brain structures drawn 
on the ICBM 152 2009b Nonlinear Symmetric 
reference volume, using the publicly available 
BICCN Cell Locator tool (https://github.com/ 
BICCN/cell-locator). Table S2 lists the coor- 
dinates and structure name corresponding to 
the approximate center of each cortical area 
pinned using the Cell Locator tool. As the 
Allen Human 3D Reference makes use of a 
gyral structural ontology the best matching 
structure in the Allen Human Reference plate- 
based 2D and associated Modified Brodmann 
structural ontology (see documentation at 
http://atlas.brain-map.org/) was also determined 
(table S2). Additionally, samples were mapped 
to the Julich Brain Maximum Probability Maps 
in MNI ICBM 152 2009c Nonlinear Asym- 
metric reference space (DOI: 10.25493/TAKY-64:D) 
using Connectome Workbench (https://www. 
humanconnectome.org/software/connectome- 
workbench) for file viewing and annotation 
and mapped structures were cross-referenced 
to the publicly available Julich-Brain v2.9 par- 
cellation (DOI: 10.25493/VSMK-H94) in the 
same 3D reference volume using the EBRAINS 
Siibra Explorer (https://atlases.ebrains.eu/ 
viewer/#/). Table S2 lists the best matching 
(primary) brain structure from the Julich-Brain 
v2.9 parcellation and, for cases where the Julich 
Maximum Probability Maps predict more than 
one cortical area at the reference coordinates 
corresponding to a mapped sample, a secondary 
structure term is listed. In the Allen Institute 
Modified Brodmann ontology, dissections of 
DFC mapped to the superior frontal gyrus 
corresponding to the lateral subdivision of 
Brodmann Area (A) 9 (A9QI). Dissections of ACC 
corresponded to A24 in the rostral (anterior) 
cingulate gyrus. Al was localized in the trans- 
verse temporal gyrus (Heschl’s gyrus) corre- 
sponding to A41. MTG dissections were mostly 
targeted to the caudal subdivision of A21 (A2!1c), 
but some dissections mapped to the interme- 
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diate subdivision of A21 (A211). M1, S1, and V1 
dissections mapped to the primary sensory 
regions MIC, SIC, and VIC, respectively. Local- 
ization of V1 was also confirmed by identi- 
fication of the Stria of Gennari on tissue slab 
photographs. For SSv4 data generation, M1 
and SI dissections targeted the putative hand 
and trunk-lower limb sub-regions of each cor- 
tical area. Confirmation of the localization of 
tissue blocks in M1 and SI was also carried out 
by processing one block from each donor for 
cryosectioning and fluorescent Nissl staining 
(Neurotrace 500/525, ThermoFisher Scien- 
tific). Nissl-stained sections were screened for 
histological hallmarks of each cortical area 
(such as the presence of Betz cells in L5 of M1) 
to verify that dissected regions were appropri- 
ately localized to either M1 or S1. AnG dissec- 
tions targeted the caudal subdivision of A39 
(A39c). All tissue dissections from parent tis- 
sue slabs were carried out using a custom cold 
table maintained at -20°C for the duration of 
dissection. 


Nuclear isolation and capture 


For SMART-seqv4 (SSv4) and Cv3 with layer 5 
microdissection, tissue blocks were placed in 
ice-cold 1X PBS supplemented with 10mM DL- 
Dithiothreitol (DTT, Sigma Aldrich) and mounted 
on a vibratome (Leica) for sectioning at 500 um 
in the coronal plane. Sections were placed in 
fluorescent Nissl staining solution (Neurotrace 
500/525, ThermoFisher Scientific) prepared in 
IX PBS with 10mM DTT and 0.5% RNasin Plus 
RNase inhibitor (Promega) and stained for 5 min 
on ice. After staining, sections were visualized 
on a fluorescence dissecting microscope (Leica) 
and cortical layers were individually microdis- 
sected using a needle blade micro-knife (Fine 
Science Tools) as previously described (https:// 
dx.doi.org/10.17504/protocols.io.bq6ymzfw). 
Nuclear suspensions were prepared from mi- 
crodissected tissue pieces as described (https:// 
dx.doi.org/10.17504/protocols.io.ewovl49p7vr2/ 
v2). Dissected L5 tissue pieces for Cv3 process- 
ing were pooled across multiple sections per 
tissue block to ensure adequate sample for Cv3 
chip loading. For Cv3 processing of tissue blocks 
encompassing all cortical layers, samples were 
placed directly into a Dounce homogenizer 
after removal from the —80°C freezer and pro- 
cessed as described (https://dx.doi.org/10.17504/ 
protocols.io.bq64mzgw). 

All samples were immunostained for fluo- 
rescence activated nuclear sorting (FANS) 
with mouse anti-NeuN conjugated to PE (EMD 
Millipore, FCMAB317PE) at a dilution of 1:500 
with incubation for 30 min at 4°C. Control 
samples were incubated with mouse IgGI1,k-PE 
Isotype control (BD Pharmingen). A subset of 
SSv4 samples was immunostained with rab- 
bit anti-SATB2 conjugated to Alexa Fluor 647 
(Abcam, ab196536) at a dilution of 1:500 to 
discriminate excitatory (SATB2+/NeuN+) and 
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inhibitory (SATB2-/NeuN+) nuclei. After im- 
munostaining, samples were centrifuged to 
concentrate nuclei and were resuspended in 
1X PBS, 1% BSA, and 0.5% RNasin Plus for 
FACS. DAPI (4’, 6-diamidino-2-phenylindole, 
ThermoFisher Scientific) was applied to sam- 
ples at a concentration of 0.1 ug/ml. Single 
nucleus sorting was carried out on either a BD 
FACSAria IT SORP or BD FACSAria Fusion in- 
strument (BD Biosciences) using a 130 um noz- 
zle. A standard gating strategy was applied to 
all samples as previously described (/7). Briefly, 
nuclei were gated on their size and scatter 
properties and then on DAPI signal. Doublet 
discrimination gates were applied to exclude 
multiplets. Lastly, samples were gated on NeuN 
signal (PE) and SATB2 (Alexa Fluor 6477) signal 
where applicable. For Cv3 experiments, NeuN+ 
and NeuN- nuclei were sorted into separate 
tubes and combined at a defined ratio of neurons 
and non-neurons (80% NeuN+, 20% NeuN-), 
except for L5 dissected samples where only 
neuronal (NeuN+) nuclei were captured. Sam- 
ples were then centrifuged and resuspended 
in LXPBS, 1% BSA, 0.5% RNasin Plus, and 5 to 
10% DMSO and frozen at —80°C until the time 
of chip loading. Samples were processed ac- 
cording to the following protocol for chip 
loading (https://dx.doi.org/10.17504/ protocols. 
i0.774hrqw). For SSv4, single nuclei were sorted 
into 8-well strip tubes containing 11.5 ul of 
SMART-seq v4 collection buffer (Takara) sup- 
plemented with ERCC MIX1 spike-in synthetic 
RNAs at a final dilution of 1x10-8 (Ambion). 
Strip tubes containing sorted nuclei were 
briefly centrifuged and stored at —80°C until 
the time of further processing. 


SMART-seqv4 RNA-sequencing 


We used the SMART-Seq v4 Ultra Low Input 
RNA Kit for Sequencing (Takara #634894) 
per the manufacturer’s instructions for reverse 
transcription of RNA and subsequent cDNA 
amplification as described (https://dx.doi.org/ 
10.17504/protocols.io.8epv517xdllb/v2). Standard 
controls were processed alongside each batch 
of experimental samples. Control strips in- 
cluded: 2 wells without cells, 2 wells without 
cells or ERCCs (i.e., no template controls), and 
either 4 wells of 10 pg of Human Universal 
Reference Total RNA (Takara 636538) or 2 
wells of 10 pg of Human Universal Reference 
and 2 wells of 10 pg Control RNA provided in 
the Clontech kit. cDNA was amplified with 21 
PCR cycles after the reverse transcription step. 
cDNA libraries were examined on either an 
Agilent Bioanalyzer 2100 using High Sensi- 
tivity DNA chips or an Advanced Analytics 
Fragment Analyzer (96) using the High Sen- 
sitivity NGS Fragment Analysis Kit (bp to 
6000bp). Purified cDNA was stored in 96-well 
plates at —20°C until library preparation. 

The NexteraXT DNA Library Preparation 
(Illumina FC-131-1096) kit with NexteraXT 
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Index Kit V2 Sets A to D (FC-131-2001, 2002, 
2003, or 2004) was used for sequencing library 
preparation as described (7). NexteraXT DNA 
Library prep was done at either 0.5x volume 
manually or 0.4x volume on the Mantis in- 
strument (Formulatrix, https://dx.doi.org/ 
10.17504/protocols.io.brdjm24n). Samples were 
quantitated using PicoGreen on a Molecular 
Bynamics M2 SpectraMax instrument. Se- 
quencing libraries were assessed using either 
an Agilent Bioanalyzer 2100 with High Sen- 
sitivity DNA chips or an Advanced Analytics 
Fragment Analyzer with the High Sensitivity 
NGS Fragment Analysis Kit for sizing. Molarity 
was calculated for each sample using average 
size as reported by Bioanalyzer or Fragment 
Analyzer and pg/ul concentration as determined 
by PicoGreen. Samples were normalized to 2 
to 10 nM with Nuclease-free Water (Ambion). 
Libraries were multiplexed at 96 samples/lane 
and sequenced on an Illumina HiSeq 2500 
instrument using Illumina High Output V4 
chemistry. 


SMART-seqv4 RNA-seq gene 
expression quantification 


Raw read (fastq) files were aligned to the GRCh38 
human genome sequence (Genome Reference 
Consortium, 2011) with the RefSeq transcrip- 
tome version GRCh38.p2 (current as of 4/13/ 
2015) and updated by removing duplicate 
Entrez gene entries from the gtf reference file 
for STAR processing. For alignment, [lumina 
sequencing adapters were clipped from the 
reads using the fastqMCF program (6/7). After 
clipping, the paired-end reads were mapped 
using Spliced Transcripts Alignment to a Ref- 
erence (STAR) (62) using default settings. Reads 
that did not map to the genome were then 
aligned to synthetic constructs (External RNA 
Controls Consortium, ERCC) and the E. coli ge- 
nome (version ASM584wv2). The results files in- 
cluded quantification of the mapped reads (raw 
exon and intron counts for the transcriptome- 
mapped reads), and percentages of reads mapped 
to the RefSeq transcriptome, to ERCC spike- 
in controls, and to E. colt. Quantification was 
performed using summerizeOverlaps from the 
R package GenomicAlignments (63). 
Expression was calculated as counts per mil- 
lion (CPM) of exonic plus intronic reads, and 
log2(CPM + 1) transformed values were used 
for a subset of analyses as described below. Gene 
detection was calculated as the number of genes 
expressed in each sample with CPM > 0. CPM 
values reflected absolute transcript number 
and gene length. Short and abundant tran- 
scripts may have the same apparent expression 
as long but rarer transcripts. Intron retention 
varied across genes, so no reliable estimates 
of effective gene lengths were available for 
expression normalization. Instead, absolute 
expression was estimated as fragments per 
kilobase per million (FPKM) using only exonic 
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reads so that annotated transcript lengths could 
be used. 


10x Chromium RNA-sequencing and 
expression quantification 


Samples were processed using the 10x Chro- 
mium Single-Cell 3’ Reagent Kit v3 following the 
manufacturer’s protocol as described (https:// 
dx.doi.org/10.17504/protocols.io.bq7cmziw). Gene 
expression was quantified using the default 10x 
Cell Ranger v3 (Cell Ranger, RRID:SCR_017344)) 
pipeline. The human reference genome used 
included the modified genome annotation de- 
scribed above for SMART-seq v4 quantifica- 
tion. Introns were annotated as “mRNA” and 
intronic reads were included in expression 
quantification. 


RNA-sequencing processing and clustering 
Cell-type label transfer 


Human M1 reference taxonomy subclass labels 
(12) were transferred to nuclei in the current 
MTG dataset using Seurat’s label transfer (3000 
high variance genes using the ‘vst’ method then 
filtered through exclusion list). This was car- 
ried out for each RNA-seq modality dataset; 
for example, human-Cv3 and human-SSv4 were 
labeled independently. Each dataset was sub- 
divided into 5 neighborhoods—IT and Non-IT 
excitatory neurons, CGE- and MGE-derived in- 
terneurons, and non-neuronal cells—based on 
marker genes and transferred subclass labels 
from published studies of human and mouse 
cortical cell types and cluster grouping rela- 
tionships in a reduced dimensional gene ex- 
pression space. 


Filtering low-quality nuclei 


SSv4 nuclei were included for analysis if they 
passed all QC criteria: 

> 30% cDNA longer than 400 base pairs 

> 500,000 reads aligned to exonic or intronic 
sequence 

> 40% of total reads aligned 

> 50% unique reads 

> 0.7 TA nucleotide ratio 

QC was then performed at the neighborhood 
level. Neighborhoods were integrated together 
across all areas and modality; for example, deep 
excitatory neurons from human-Cv3, human- 
Cv3-Layer5 and human-SSv4 datasets were 
integrated using Seurat integration functions 
with 2000 high variance genes. Integrated 
neighborhoods were Louvain clustered into 
over 100 meta cells, and Low-quality meta cells 
were removed from the dataset based on rela- 
tively low UMI or gene counts (included glia 
and neurons with greater than 500 and 1000 
genes detected, respectively), predicted dou- 
blets (include nuclei with doublet scores under 
0.3), and/or subclass label prediction metrics 
within the neighborhood (excitatory labeled 
nuclei that clustered with majority inhibitory 
or non-neuronal nuclei). 
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RNA-seg clustering 

Nuclei were normalized using SCTransform 
(64), and neighborhoods were integrated to- 
gether within an area and across individuals 
and modalities by identifying mutual near- 
est neighbor anchors and applying canonical 
correlation analysis as implemented in Seurat 
(65). For example, deep excitatory neurons from 
human-Cv3 were split by individuals and integ- 
rated with the human-SSv4 deep excitatory neu- 
rons. Integrated neighborhoods were Louvain 
clustered into over 100 meta cells. Meta cells 
were then merged with their nearest neighbor- 
ing meta cell until merging criteria were suf- 
ficed, a split and merge approach that has 
been previously described (J2). The remain- 
ing clusters underwent further QC to exclude 
Low-quality and outlier populations. These ex- 
clusion criteria were based on irregular group- 
ings of metadata features that resided within a 
cluster. 


Defining cross-area consensus cell types 


For each neighborhood, Cv3 nuclei were inte- 
grated together across individuals. The inte- 
grated latent space was Louvain clustered into 
over 100 meta cells. Meta cells were then merged 
with their nearest neighboring meta cell until 
merging criteria were sufficed, a split and merge 
approach that has been previously described 
(12) and was also used to define the within- 
area cluster identities. The process was repeated 
for each neighborhood, with an example dia- 
gram of the workflow shown in Fig. 5A. 


Cell-type taxonomy generation 


For each area, a taxonomy was built using the 
final set of clusters and was annotated using 
subclass mapping scores, dendrogram relation- 
ships, marker gene expression, and inferred 
laminar distributions. Within-area taxonomy 
dendrograms were generated using build_dend 
function from scrattch_hicat R package. A 
matrix of cluster median log2(cpm + 1) expres- 
sion across the 3000 High-variance genes for 
Cv3 nuclei from a given area were used as in- 
put. The cross-area dendrogram was generated 
with a similar workflow but was downsam- 
pled to a maximum of 100 nuclei per cross- 
area cluster per area. The 3000 High-variance 
genes used for dendrogram construction were 
identified from the downsampled matrix con- 
taining Cv3 nuclei from all eight areas. 


Cell-type comparisons across cortical areas 
Differential gene expression 


To identify subclass marker genes within an 
area, Cv3 datasets from each area were down- 
sampled to a maximum of 100 nuclei per 
cluster per individual. Differentially expressed 
marker genes were then identified using the 
FindAllMarkers function from Seurat, using 
the Wilcoxon sum rank test on log-normalized 
matrices with a maximum of 500 nuclei per 
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group (subclass versus. all other nuclei as 
background). Statistical thresholds for mark- 
ers are indicated in their respective figures. To 
identify area marker genes across subclasses, 
Cv3 datasets from each area were downsampled 
to a maximum of 50 nuclei per cluster per in- 
dividual. Downsampled counts matrices were 
then grouped into pseudo-bulk replicates (area, 
individual, subclass) and the counts were 
summed per replicate. DESeq2 functionality 
was then used to perform a differential ex- 
pression analysis between area pairs (or com- 
parisons of interest) for each subclass using 
the Wald test statistic. 


Transcriptomic entropy across areas 


To quantify intercell transcriptomic hetero- 
geneity across areas for each subclass we cal- 
culated the transcriptomic entropy in the 
observed data (structured) and compared against 
entropy in permuted data (unstructured). 
Transcriptomic heterogeneity is defined as 
the difference between the structured and 
unstructured entropy. To compute transcrip- 
tomic entropy, we followed these steps: (1) 
Randomly down-sample the cells within each 
subclass by taking 250 cells from each cross- 
area cell type. (2) Identify the highly variable 
genes in each area and take the union of genes 
as our set of interest. (3) Then, by following a 
recently reported computational approach to 
quantify transcriptomic heterogeneity (66), 
we computed the per-area transcriptomic en- 
tropy for each subclass. 


Identifying changes in cell-type proportions 
across areas 


Cell-type proportions are compositional, where 
the gain or loss of one population necessarily 
affects the proportions of the others, so we 
used scCODA (25) to determine which changes 
in cell class, subclass, and cell-type proportions 
across areas were Statistically significant. We 
analyzed neuronal and non-neuronal popula- 
tions separately because nuclei were sorted 
based on NeuN immunostaining to enrich for 
neurons. The proportion of each cell type was 
estimated using a Bayesian approach where 
proportion differences across individuals were 
used to estimate the posterior. All composi- 
tional and categorical analyses require a ref- 
erence population to describe differences 
with respect to and, because we were uncer- 
tain which populations should be unchanged, 
we iteratively used each cell type and each 
area aS a reference when computing abun- 
dance changes. To account for sex differences, 
we included it as a covariate when testing for 
abundance changes. Separately for neuronal 
and non-neuronal populations, we reported 
the effect size of each area for each cell type 
(table S10) and used a mean inclusion prob- 
ability cutoff of 0.7 for calling a population 
consistently different. 
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Partitioning variation in gene expression 
across areas 

Variation partitioning analysis was performed 
to prioritize the drivers of variation across 
areas within each subclass. Using linear mixed- 
effect models implemented in the variance- 
Partitioning bioconductor package: http://bio- 
conductor.org/packages/variancePartition (66) we 
identify genes whose variance is best explained 
along the M-S (anatomical left to right), R-C, 
and D-V axes as well as by cortical area and 
donor. The order of areas along these axes 
was defined based on the approximate x, y, 
and z coordinates of tissue samples based on 
a common coordinate framework of the adult 
human brain (20) (table S2). Genes were re- 
moved from the analysis based on the follow- 
ing criteria: (1) expressed in less than 10 cells, 
(2) greater than 80% dropout rate, (3) zero 
variance in expression, and (4) expression less 
than 1 CPM on average. The variance parti- 
tioning linear mixed-effect model was then 
defined as: 


Gene ~ medial_lateral + rostral_caudal + 
dorsal_ventral + (1Jarea) + (1|(donor) 


and passed into the variancePartition func- 
tion ~fitVarPartModel()>. We determined the 
amount of variation explained per covariate for 
each gene from the ‘extractVarPart()” function. 


In situ profiling of gene expression 
Human postmortem frozen brain tissue was 
embedded in Optimum Cutting Temperature 
medium (VWR,25608-930) and sectioned on 
a Leica cryostat at -17°C at 10 um onto Vizgen 
MERSCOPE coverslips (VIZGEN 2040003). These 
sections were then processed for MERSCOPE 
imaging according to the manufacturer’s in- 
structions. Briefly: sections were allowed to ad- 
here to these coverslips at room temperature 
for 10 min prior to a 1 min wash in nuclease- 
free phosphate buffered saline (PBS) and fix- 
ation for 15 min in 4% paraformaldehyde in 
PBS. Fixation was followed by 3x5 min washes 
in PBS prior to a 1 min wash in 70% ethanol. 
Fixed sections were then stored in 70% ethanol 
at 4 C prior to use and for up to one month. 
Human sections were photobleached using a 
150W LED array for 72 hours at 4°C prior to 
hybridization then washed in 5 ml Sample 
Prep Wash Buffer (VIZGEN 20300001) in a 
5 cm petri dish. Sections were then incubated 
in 5 ml Formamide Wash Buffer (VIZGEN 
20300002) at 37 C for 30 min. Sections were hy- 
bridized by placing 50 ul of VIZGEN-supplied 
Gene Panel Mix onto the section, covering 
with parafilm and incubating at 37°C for 36 to 
48 hours in a humidified hybridization oven. 
Following hybridization, sections were washed 
twice in 5 ml Formamide Wash Buffer for 
30 min at 47°C. Sections were then embedded 
in acrylamide by polymerizing VIZGEN Em- 
bedding Premix (VIZGEN 20300004) according 
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to the manufacturer’s instructions. Sections 
were embedded by inverting sections onto 
110 ul of Embedding Premix and 10% Ammo- 
nium Persulfate (Sigma A3678) and TEMED 
(BioRad 161-0800) solution applied to a Gel 
Slick (Lonza 50640) treated 2x3 glass slide. 
The coverslips were pressed gently onto the 
acrylamide solution and allowed to polymer- 
ize for 1.5 hours. Following embedding, sec- 
tions were cleared for 24 to 48 hours with a 
mixture of VIZGEN Clearing Solution (VIZGEN 
20300003) and Proteinase K (New England 
Biolabs P8107S) according to the Manufacturer’s 
instructions. Following clearing, sections were 
washed twice for 5 min in Sample Prep Wash 
Buffer (PN 20300001). VIZGEN DAPI and PolyT 
Stain (PN 20300021) was applied to each sec- 
tion for 15 min followed by a 10 min wash in 
Formamide Wash Buffer. Formamide Wash 
Buffer was removed and replaced with Sam- 
ple Prep Wash Buffer during MERSCOPE set 
up. 100 ul of RNAse Inhibitor (New England 
BioLabs M0314L) was added to 250 ul of Imag- 
ing Buffer Activator (PN 203000015) and this 
mixture was added through the cartridge acti- 
vation port to a pre-thawed and mixed MERSCOPE 
Imaging cartridge (VIZGEN PN104.0004). 15 ml 
mineral oil (Millipore-Sigma m5904-6X500ML) 
was added to the activation port and the 
MERSCOPE fluidics system was primed ac- 
cording to VIZGEN instructions. The flow cham- 
ber was assembled with the hybridized and 
cleared section coverslip according to VIZGEN 
specifications and the imaging session was 
initiated after collection of a 10X mosaic DAPI 
image and selection of the imaging area. For 
specimens that passed the minimum count 
threshold, imaging was initiated, and process- 
ing completed according to VIZGEN’s pro- 
prietary protocol. Following processing and 
segmentation through MERSCOPE software, 
cells with fewer than 50 counts, or with an area 
outside the 100 to 300 um? range were elimi- 
nated from the mapping process. 

The 140 gene human cortical panel was se- 
lected using a combination of manual and 
algorithmic based strategies requiring a refer- 
ence single cell/nucleus RNA-seq data set from 
the same tissue, in this case the human MTG 
snRNAseq dataset and resulting taxonomy (17). 
First, an initial set of high-confidence marker 
genes are selected through a combination of 
literature search and analysis of the reference 
data. These genes are used as input for a greedy 
algorithm (detailed below). Second, the refer- 
ence RNA-seq data set is filtered to only include 
genes compatible with mFISH. Retained genes 
need to 1) be long enough to allow probe de- 
sign (> 960 base pairs); 2) be expressed high- 
ly enough to be detected (FPKM >= 10), but 
not so high as to overcrowd the signal of other 
genes in a cell (FPKM < 500); 3) have low ex- 
pression in off-target cells (FPKM < 50 in non- 
neuronal cells); and 4) be differentially expressed 
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between cell types (top 500 remaining genes 
by marker score20). To sample each cell type 
more evenly, the reference data set is also 
filtered to include a maximum of 50 cells per 
cluster. 

The main step of gene selection uses a greedy 
algorithm to iteratively add genes to the ini- 
tial set. To do this, each cell in the filtered refer- 
ence data set is mapped to a cell type by taking 
the Pearson correlation of its expression levels 
with each cluster median using the initial gene 
set of size n, and the cluster corresponding to 
the maximum value is defined as the “mapped 
cluster”. The “mapping distance” is then de- 
fined as the average cluster distance between 
the mapped cluster and the originally assigned 
cluster for each cell. In this case a weighted 
cluster distance, defined as one minus the 
Pearson correlation between cluster medians 
calculated across all filtered genes, is used to 
penalize cases where cells are mapped to very 
different types, but an unweighted distance, 
defined as the fraction of cells that do not map 
to their assigned cluster, could also be used. 
This mapping step is repeated for every pos- 
sible n+1 gene set in the filtered reference 
data set, and the set with minimum cluster 
distance is retained as the new gene set. These 
steps are repeated using the new get set (of 
size n+1) until a gene panel of the desired 
size is attained. Code for reproducing this 
gene selection strategy is available as part of 
the mfishtools R library (https://github.com/ 
AllenInstitute/mfishtools). 


Cell-type mapping of MERSCOPE data 


Any genes not matched across both the MER- 
SCOPE gene panel and the snRNASeq mapping 
taxonomy were filtered from the snRNASeq 
dataset. We calculated the mean gene expres- 
sion for each gene in each snRNAseq cluster. 
We assigned MERSCOPE cells to snRNAseq 
clusters by finding the nearest cluster to the 
mean expression vectors of the snRNASeq 
clusters using the cosine distance. All scripts 
and data used are available at: https://github. 
com/AllenInstitute/human_cross_areal. 


GFAP Immunofluorescence 


Tissue blocks from cortical areas of interest 
were removed from fresh-frozen tissue slabs 
as described above. Immediately after dissec- 
tion, tissue blocks were drop-fixed in cold 4% 
paraformaldehyde overnight in a 4°C fridge. 
Tissue blocks were then rinsed in multiple 
washes of 1X PBS, cryoprotected in sequential 
15% and 30% sucrose solutions, and embedded 
in OCT. Sections were cut free floating at 30 um 
in the coronal plane on a Leica cryostat into 
1X PBS and were stored at 4°C or at —20°C in 
cryoprotectant solution until the time of use. 
Sections were processed for immunofluores- 
cence using a rabbit polyclonal anti-GFAP 
antibody (Agilent, Z0334) at a dilution of 1:1000 


Jorstad et al., Science 382, eadf6812 (2023) 


BRAIN CELL CENSUS 


and mouse monoclonal anti-NeuN antibody 
(Millipore Sigma, MAB377) at a dilution of 
1:1000. Primary antibodies were incubated 
overnight at 4°C, followed by incubation in 
Alexa Fluor conjugated secondary species- 
specific antibodies for 2 hours at room tem- 
perature. Sections were counterstained with 
DAPI and Neurotrace 500 fluorescent Nissl 
stain and were mounted in ProLong Gold Anti- 
fade Mountant (ThermoFisher Scientific). Sec- 
tions were imaged on a Nikon TiE fluorescence 
microscope equipped with NIS-Elements Ad- 
vanced Research imaging software (v4.20). 
GFAP processes were traced using the SNT 
plugin in the Fiji distribution of ImageJ. 
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Signature morphoelectric properties of diverse 
GABAergic interneurons in the human neocortex 
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INTRODUCTION: Recent studies using single-cell 
transcriptomic analysis have shed new light 
on genetically defined cell types in the human 
brain. However, a deeper understanding of multi- 
modal cellular properties, such as electrical 
activity and morphology, remains elusive and 
is central to understanding the role of distinct 
cell types in cognitive function and disease. 


RATIONALE: Understanding the fundamental 
properties of different cell types is important 
for gaining insights into their role in neural cir- 
cuits, cognition, and disease. This is challeng- 
ing for cell types in the human brain because of 
the limited availability of tissue, the heteroge- 
neity of neurons, and the lack of genetic ap- 
proaches to effectively target these spatially 
distributed and in some cases, exceedingly rare 
types. Rapid viral genetic labeling and patch- 
clamp electrophysiology combined with RNA 
sequencing (Patch-seq) facilitates the target- 
ing of specific cell types in tissue from human 
neurosurgeries and thus allows the character- 
ization of multimodal properties of neurons in 
the human cortex. 


Patch-seq analysis of human GABAergic 
neurons. (A) MRIs are obtained to accu- 
rately map the area of resected human 
brain tissue. (B) Viral labeling combined 
with Patch-seq is used to collect the 
multimodal properties of human cortical 
GABAergic neurons. Each sample is 
mapped to a taxonomy to identify the 
transcriptomic type. (C) Representative 
GABAergic subclasses, their morphologies, 
and action-potential firing patterns. 

(D) Transcriptomic type SST FRZB is found 
along a continuum of electrophysiological 
and morphological features between the SST 
and PVALB subclasses. (E) Morphologically 
defined double-bouquet cells map to both 
SST CALBI and SST ADGRG6, each dis- 
playing a distinct distribution across the 
cortical layers. (F) Homology mapping of 
PVALB types between mice and humans 
reveals morphological and electrophysio- 
logical differences, including larger spatial 
extent and higher excitability of human 
neurons. 
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A MRI delineating region 


D Alignment of SST FRZB neurons 
with PVALB subclass 


PVALB 
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RESULTS: Patch-seq sampling facilitated tar- 
geted acquisition and analysis of 778 human 
neurons in cortical layers 2 to 6, across 44 out of 
45 y-aminobutyric acid-producing (GABAergic) 
transcriptomic types. Aggregated data from 
acute and culture paradigms provide new de- 
scriptions and direct comparison of the signa- 
ture morphoelectric properties of the canonical 
human interneuron subclasses, LAMP5/PAX6, 
VIP, SST, and PVALB, and a deeper dive into fea- 
tures of select GABAergic transcriptomic types. 

Detailed characterization of the SST subclass 
revealed specific multimodal properties of in- 
dividual transcriptomic types. The SST FRZB 
transcriptomic type exhibits gene expression 
signatures along a continuum of PVALB and 
SST subclasses, but the morphoelectric proper- 
ties of this type clearly indicate strong alignment 
with the PVALB subclass. For an additional tran- 
scriptomic type, SST CALBI, we found multiple 
discrete morphological types suggesting that 
further splitting may be warranted. Double- 
bouquet cells of the human neocortex belong 
to two transcriptomic types within the SST 
subclass with enrichment in temporal versus 


B Viral labeling and Patch-seq 
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frontal cortex regions. We compared the 1 hee 


phoelectric properties of homologous mea — 
and human neocortical GABAergic neuron 
types and found that human types are more 
excitable and have a larger spatial extent with 
less neurite branching. 


CONCLUSION: An impactful finding of our study 
is the direct demonstration of how multimodal 
Patch-seq data is vital to refinement of tran- 
scriptomic cell-type taxonomies. Cellular tax- 
onomies built on single-cell transcriptomes 
and differential gene expression are not static 
but rather represent a starting foundation to 
build upon as new data modalities are obtained 
and aligned at the resolution of transcriptomic 
types. We also demonstrate the immense po- 
tential and utility of viral genetic labeling and 
Patch-seq for targeted analysis of human neo- 
cortical GABAergic neuron subclasses and types 
in ex vivo brain slices. This work can serve as ” 
a roadmap for future functional studies of hu- 
man brain cell types at the resolution of emerg- 
ing transcriptomic cell-type taxonomies and 
provides a rich open-access dataset for explor- 
ing gene-function relationships for a wide 
diversity of human neocortical GABAergic neu- , 
ron types. 
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Human cortex transcriptomic studies have revealed a hierarchical organization of y-aminobutyric 
acid—producing (GABAergic) neurons from subclasses to a high diversity of more granular types. 

Rapid GABAergic neuron viral genetic labeling plus Patch-seq (patch-clamp electrophysiology plus 
single-cell RNA sequencing) sampling in human brain slices was used to reliably target and analyze 
GABAergic neuron subclasses and individual transcriptomic types. This characterization elucidated 
transitions between PVALB and SST subclasses, revealed morphological heterogeneity within an 
abundant transcriptomic type, identified multiple spatially distinct types of the primate-specialized 
double bouquet cells (DBCs), and shed light on cellular differences between homologous mouse 
and human neocortical GABAergic neuron types. These results highlight the importance of multimodal 
phenotypic characterization for refinement of emerging transcriptomic cell type taxonomies and for 
understanding conserved and specialized cellular properties of human brain cell types. 


ecent work on the human brain has shed 

light on transcriptomic cell type diver- 

sity, including the identification of pre- 
viously undescribed cell types and finer 
distinctions than previously recognized 
(1-11). However, a deeper understanding of the 
phenotypic properties of transcriptomic types 
is necessary to provide mechanistic insights 
into their role in cognitive function and disease. 
Although it is hypothesized in transcriptomic 
type classification that differentially expressed 
genes within each transcriptomic type underlie 
distinct morphological and functional prop- 
erties, direct measurements are lagging well 
behind the pace of transcriptomic characteri- 
zation. Additionally, there exists unresolved 
ambiguity across, and diversity within, tran- 
scriptomic types that remain underexplored 
for many different brain regions and species. 
The Patch-seq method (patch-clamp electro- 
physiology plus single-cell RNA sequencing) 
provides the most direct means to determine 
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the fundamental morphoelectric properties of 
neuronal transcriptomic types, evaluate dis- 
tinctness between types, and explore gene- 
function relationships (12-14). Recently, this 
approach has been used successfully to char- 
acterize the most abundant glutamatergic 
pyramidal neuron types in the human supra- 
granular cortex (15). This work provided a deep 
functional context regarding the expanded di- 
versity of human glutamatergic pyramidal 
neuron types relative to those of the mouse 
supragranular cortex, as well as the elucida- 
tion of the transcriptomic identity of human 
cortical cell types with selective vulnerability 
in disease. Additionally, gradients in gene ex- 
pression were largely mirrored by gradients in 
other measured morphoelectric features, con- 
sistent with other recent Patch-seq studies of 
mouse cortical neuron types (6, 17). The Patch- 
seq technique has also been used to measure the 
multimodal cellular properties of cortical layer 5 
extratelencephalic-projecting (L5 ET) neurons 
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in the human cortex and to apply homology 
mapping based on analysis of single-cell tran- 
scriptomes for comparing the cellular features 
of this subcortically projecting cortical neuron 
type across mammalian species (/8). 

y-aminobutyric acid-producing (GABAergic) 
neurons are crucial for modulating and tuning 
neuronal circuits (79) and their dysfunction is 
at the forefront of a variety of human brain dis- 
orders (20-22), yet a comprehensive character- 
ization of human cortical GABAergic neuron 
types has proven technically far more challeng- 
ing. This stands in contrast to decades of im- 
pressive advances in the mouse model, where 
development and refinement of a multitude of 
viral and transgenic targeting and in vivo per- 
turbation strategies have facilitated rapid pro- 
gress (23). GABAergic neurons in the cortex 
historically have been subdivided into four 
major subclasses based on the neurochemical 
markers they express: parvalbumin (PVALB), 
somatostatin (SST), vasoactive intestinal pep- 
tide (VIP), and LAMP5/PAX6 [synonymous with 
5-hydroxytryptamine receptor 3A (HTR3A)- 
expressing, but lacking VIP] (23, 24). Single- 
cell transcriptomics studies of the mouse and 
human cortex revealed previously unrecog- 
nized diversity of neuron types within each of 
these four canonical interneuron subclasses 
(1, 25, 26). Mouse Patch-seq studies of cortical 
GABAergic neurons have shown discrete mor- 
phological and electrophysiological proper- 
ties between the subclasses, yet a continuum 
within each subclass (16, 17). 

By comparison, similar studies of the hu- 
man cortex have proven challenging owing to 
the limited availability of adult human brain 
tissue and lack of genetic approaches to effec- 
tively target these spatially distributed, and in 
some cases, exceedingly rare types. Prior single- 
neuron electrophysiology studies of human cor- 
tical GABAergic neurons are relatively few and 
have relied mainly on firing patterns, such as 
fast spiking (FS) versus non-FS, limited marker 
genes, or gene panels to achieve approximate 
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alignment to subclasses described in rodents 
(3, 27-33). The previous lack of a comprehen- 
sive human cortical cell-type taxonomy as a 
reference for these studies likely hindered ef- 
forts at classification, and the degree of conser- 
vation between human and mouse neocortical 
GABAergic cell-type diversity and marker-gene 
expression has only partially been delineated. 
Here we combined rapid viral genetic labeling 
with Patch-seq in cortical slices derived from 
human neurosurgical resections. Mapping of 
recorded neurons to transcriptomically defined 
types in our human cortical taxonomy facili- 
tated a detailed functional annotation of hu- 
man neocortical GABAergic subclasses and 
select abundant transcriptomic types. 


Results 


We performed Patch-seq experiments to char- 
acterize the electrophysiological, morphological, 
and transcriptomic profiles of GABAergic neu- 
rons in human neocortical brain slices using a 
well-established Patch-seq platform (13-15). To 
expand our sampling capability, both tempo- 
rally and to facilitate preferential targeting 
with viral genetic labeling, we implemented 
a human ex vivo brain slice culture paradigm 
(34). Brain slices were transduced with an 
adeno-associated virus (AAV) vector CN1390 
that has been shown to fluorescently label hu- 
man cortical GABAergic neurons across all four 
canonical GABAergic subclasses (35), which 
enabled direct visualization and targeting for 
Patch-seq experiments (Fig. 1A) (2, 35). 

We obtained human samples from neuro- 
surgically resected neocortical tissues and 
processed them with standardized protocols 
across three experimental sites in the U.S., the 
Netherlands, and Hungary (/4, 15, 18, 36); 
most samples originated from the temporal 
and frontal lobes (data S1). Human Patch-seq 
neurons were mapped to a middle temporal 
gyrus (MTG) single-nucleus transcriptomics- 
based reference taxonomy (J) and assigned to 
a transcriptomic subclass and type by using a 
“tree-mapping” classifier (materials and meth- 
ods) (16). 


Morphological, electrophysiological, and 
transcriptomic features in the 
Slice culture paradigm 


With the use of Patch-seq recordings, we exam- 
ined the acute and culture brain slice para- 
digms with three different data modalities: 
transcriptome, morphology, and electrophys- 
iology. Patch-seq neurons from culture prep- 
arations recorded at 2 to 16 days in culture 
(DIC) had significantly (P < 0.0001) higher 
genes detected as compared with the acute par- 
adigm (Fig. 1B). This elevation was prevalent 
and remained consistent across all days sam- 
pled in culture (fig. SIA). Recent work has 
shown that Patch-seq samples may exhibit ele- 
vated microglial marker genes (37); therefore, 
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we sought to determine if the increased number 
of genes observed in the culture paradigm was 
specific to microglial markers. We performed 
standard Seurat V4 clustering on Patch-seq neu- 
rons from the culture paradigm (38). These neu- 
rons clustered into two domains, with or without 
microglial gene expression signatures, indicating 
that more variation was explained by microglial- 
than by cell type-associated genes (fig. SIB). This 
microglial signature (data S2) was also observed 
in a subset of neurons collected with the acute 
paradigm, although the effect was less pro- 
nounced (fig. S1, B and C). Microglial gene sig- 
natures did not impact the ability to accurately 
map transcriptomic types, as neurons of the 
same subclass defined from tree mapping co- 
localized when visualizing the transcriptome 
through a uniform manifold approximation 
projection (UMAP) (39), either when integrat- 
ing by paradigm and microglial signature (fig. 
SID) or when restricting the analysis only to 
marker genes (fig. SIE). Upon examination of 
key differentially expressed genes responsible 
for GABAergic subclass discrimination, aver- 
age gene expression was strongly correlated 
(P < 0.0001), demonstrating alignment between 
acute and culture paradigms (Fig. 1C). Collect- 
ively, when examining the normalized marker 
sum (NMS) score, a method to quantify the 
expression of mapping-related marker genes 
(14, 40), we found no difference between the 
acute and culture paradigms (Fig. 1D). These 
results suggest that the key markers used to 
map cell types are not perturbed in the culture 
paradigm or in the presence of microglial sig- 
natures, thereby allowing accurate mapping of 
transcriptomic types. 

To further examine differences between Patch- 
seq in acute and cultured slices, we performed 
an in-depth investigation into the most abun- 
dant type, L2-4- PVALB WFDC2 (hereafter called 
PVALB WFDC2). For this transcriptomic type, 
the acute and culture datasets exhibited overlap 
in transcriptomic UMAP space (Fig. 1E). We also 
evaluated 50 morphological features of cortical 
depth-matched PVALB WFDC2 neurons (Fig. 
IF) from culture (m = 9) and acute paradigms 
(n = 10) and didn’t find any difference among 
the two conditions (fig. S2A). Additionally, di- 
mensionality reduction of the morphological 
features showed that PVALB WFDC2 neurons 
from the culture and acute paradigms occu- 
pied similar space in the PVALB-rich portion of 
morphological UMAP, demonstrating consist- 
ency in the morphometric features between 
the two paradigms (Fig. 1H). 

Analysis of electrophysiological responses 
from PVALB WFDC2 neurons revealed that 
43 of 84 features differed across the two par- 
adigms (fig. S2B). Overlaid voltage traces of a 
single action potential (AP) and response to 
hyperpolarizing current are shown in Fig. 1, J 
and K, respectively. AP width and sag are two 
of the key features that were different between 
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the two groups (Fig. 1L); however, multiple 
other cardinal intrinsic membrane properties, 
such as interspike interval or input resistance, 
were not. When projecting all 84 electrophys- 
iological features through the UMAP, PVALB 
WFDC2 neurons from both paradigms occu- 
pied a similar space (Fig. II). The features that 
were different between the two paradigms 
were observed at the earliest time point (2 DIC) 
and did not change with additional days in cul- 
ture (fig. S2C). With the increase in gene ex- 
pression and differences in electrophysiological 
features in the culture paradigm, we sought to 
determine if the genes responsible for physio- 
logy were preserved or altered. We examined 
114 voltage-gated ion-channel genes and found 
only seven that were different in culture ver- 
sus acute paradigm (fig S2D). 


Comparison of features across human 
GABAergic subclasses 


To achieve a more comprehensive multimodal 
analysis, we combined the data from acute 
and culture paradigms and characterized the 
transcriptomic, intrinsic physiological, and/or 
morphological properties of 778 neurons from 
the four canonical GABAergic subclasses in hu- 
man cortex. We described neurons in cortical 
layers 2 to 6 (L2-6) across 44 transcriptomic 
types (fig. S3). When visualized in Seurat-aligned 
transcriptomic space through UMAP, the Patch- 
seq dataset overlapped with single-nucleus RNA 
sequencing (SnRNA-seq) human MTG dataset 
(1), demonstrating the feasibility, consistency, 
and accuracy of mapping quality (Fig. 2A). 
PVALB, SST, VIP, and LAMP5/PAX6 interneu- 
rons were separable when visualized by elec- 
trophysiological (Fig. 2B) and morphological 
(Fig. 2C) features in UMAP space, with PVALB 
neurons occupying the most distinct subre- 
gions in both UMAPs. Proximity in the UMAPs 
suggest that quantitative features from all three 
modalities can be used to distinguish GABAergic 
subclasses. 

Dendrites and axons of a subset of neurons 
(n = 140) with sufficient labeling were imaged 
at high resolution and digitally reconstructed 
(Fig. 2D and fig. S4). Neurons from the LAMP5/ 
PAX6 subclass were dominated by classic neu- 
rogliaform morphologies, and although found 
in all cortical layers, here we report predom- 
inately on neurons in L2 and L3 [see (36) for 
an in-depth LI investigation]. LAMP5/PAX6 
neurons were distinguished by their stellate 
dendrites, numerous primary dendrites, and 
extensive axon branching. They also had the 
shortest soma-to-branch tip Euclidean dis- 
tance for both axon and dendrite (Fig. 2E). The 
morphological dataset for the VIP subclass, 
also dominated by neurons in L2 and L3, dis- 
played more diverse morphologies, many with 
bipolar dendrites and descending axon, with 
either a wide or narrow profile. To improve 
sampling of the VIP subclass, we additionally 
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Fig. 1. Slice culture paradigm and multimodal characterization of human 
cortical GABAergic neurons. (A) Tissue processing schematic for acute and 
culture paradigm, Patch-seq targeting guided by brightfield or fluorescence, 
and subsequent multimodal analysis or characterization. Scale bars: human 
brain specimen, 500 um; brightfield of patched neuron, 10 um; fluorescent image, 
50 um. (B) Box plot represents the difference in the number of genes detected 
between the two paradigms. Asterisks indicate significant pairwise comparisons 
(****P < 0.0001, FDR-corrected Mann-Whitney test). (C) Scatter plot of the average 
gene expression in acute versus culture for the top 25 differentially expressed 
genes for the LAMP5/PAX6, VIP, SST, and PVALB subclasses. Red line is 
regression line in each plot. (D) Box plot representing the difference in NMS 
score between the two paradigms. There were no significant pairwise 
comparisons (FDR-corrected Mann-Whitney test). (E) UMAP representation 

of the PVALB-subclass transcriptomic space with PVALB WFDC2 acute 

and culture neurons highlighted. (F) Cortical depth-matched PVALB WFDC2 
morphologies from acute and culture shown aligned to an average cortical 
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template, with corresponding voltage responses to a 1 s—long current step 

of -90 pA and rheobase +80 pA. (G) Box plots showing select morphology 
features for cortical depth-matched PVALB WFDC2 neurons from the acute 
and culture paradigm. There were no significant pairwise comparisons 
(FDR-corrected Mann-Whitney test). (H) UMAP representation of morphology 
space with PVALB WFDC2 acute and culture neurons highlighted. (1) UMAP 
representation of electrophysiology space with PVALB WFDC2 acute and 
culture neurons highlighted. (J) Overlaid single action potential sweeps from 
acute and culture PVALB WFDC2 neurons. Black lines represent the mean 

and are overlaid to the right for direct comparison. (K) Overlaid and normalized 
voltage response to a —90 pA hyperpolarizing current step from acute and 
culture PVALB WFDC2 neurons. Black lines represent the mean of the group. 
(L) Box plots showing select distinguishing features from PVALB WFDC2 neurons 
from the acute and culture paradigm. Asterisks indicate significant pairwise 
comparisons (****P < 0.0001, FDR-corrected Mann-Whitney test). ISI, interspike 
interval: Fl curve slope, frequency-current curve slope. 
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Fig. 2. Human neocortical GABAergic neuron subclass characterization. 

(A) Integrated snRNA-seq and Patch-seq transcriptomic UMAP of GABAergic 
neurons. UMAP representations of (B) electrophysiology and (C) morphology space. 
(D) Representative morphologies from GABAergic subclasses with each transcriptomic 
type (t-type) represented by different colors and the corresponding t-type gene names 
(primary t-type gene name representing subclass removed for clarity) displayed in 
gray at bottom. The morphologies are aligned to an average cortical template with 
corresponding voltage responses to a 1 s—long current step to a —90 pA and rheobase 
+80 pA, shown below. (E) Box plots representing distinguishing morphology features by 


subclass (Kruskal-Wallis ANOVA on ranks; P < 0.05, FDR corrected. Post-hoc Dunn's 
test; *P < 0.05, **P < 0.01, FDR corrected). (F) Overlaid single action potential sweeps 
from each GABAergic subclass. Black lines represent the mean and are overlaid to the 
right for direct comparison. (G) Overlaid and normalized voltage response to a 1-s 
-90-pA hyperpolarizing current step from each GABAergic subclass. Black lines 
represent the mean of the group. (H) Box plots representing key distinguishing 
electrophysiological features by subclass (Kruskal-Wallis ANOVA on ranks; 

P <(Q.05, FDR corrected. Post-hoc Dunn's test; *P < 0.05, **P < 0.01, ***P < 0.001, 
****P < Q.0001, FDR corrected). 


applied an enhancer AAV vector CN2039 to 
label human neocortical VIP types over other 
GABAergic neuron subclasses (materials and 
methods). Although the viral-labeled neurons 
recorded by Patch-seq did not exclusively map 
to VIP types, we found strong enrichment for 
diverse VIP neuron types with small cell bodies 
and bipolar dendrites in L2 and L3 (fig. S5). The 
PVALB subclass contained basket cell-like 
morphologies with multipolar dendrites and 
radially arrayed axons with a wide horizon- 
tal span. The L4-5 MEPE type within the 
PVALB subclass (Fig. 2D) had axons that spanned 
L3 to LA and resembled mouse cortical inter- 
laminar or translaminar basket cells (6, 417). 
The SST subclass contained diverse morphol- 
ogies with distinct axonal shape. Qualitatively, 
these morphological types ranged from clas- 
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sical DBCs to non-Martinotti cells (MCs). One 
human SST transcriptomic type described in 
(1) is L4-5 SST STK32A and its corresponding 
mouse homologs, Sst Hpse Cbln4 and Sst 
Hpse Sema3c. L4-5 SST STK32A and Sst Hpse 
CbIn4 shared similar morphological quali- 
ties, including being found predominately in 
LA and L5 and extensive axonal branching that 
strongly innervated LA (16, 42) (Fig. 2D). Mouse 
Sst Hpse CbIn4 neurons exhibit distinct electro- 
physiological properties, referred to as quasi 
fast spiking (16, 43, 44) and preferentially tar- 
get LA pyramidal neurons (16, 43, 44), further 
suggesting that the human homolog L4-5 SST 
STK32A may also have selective connectivity 
to LA pyramidal neurons. 

Comprehensive analysis of the associated volt- 
age traces for the four GABAergic subclasses, 


13 October 2023 


LAMP5/PAX6 (n = 75), VIP (n = 99), SST (n = 
149), and PVALB (n = 352), revealed distinct 
passive and active electrophysiological features 
(Fig. 2, B, D, and F to H), consistent with pre- 
vious rodent studies (23, 45). Example hyper- 
polarizing and depolarizing voltage traces are 
shown below the reconstructions in Fig. 2D and 
demonstrate diversity in intrinsic firing across 
the interneuron subclasses, which ranges from 
adapting to irregular or fast spiking. Figure 2F 
(right) shows overlaid normalized voltage traces 
of a single AP in response to a current pulse with 
subclass average. APs from PVALB subclass had 
the fastest kinetics, whereas LAMP5/PAX6 had 
the highest peak amplitude versus other sub- 
classes. Overlaid, normalized voltage traces in 
response to hyperpolarizing current (Fig. 2G) 
show that LAMP5/PAX6 and VIP subclasses 
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had minimal sag compared with PVALB and 
SST subclasses. Several additional features fur- 
ther demonstrated the electrophysiological dis- 
tinctness of each subclass (Fig. 2H). 


Revising subclass assignment based 
on multimodal data 


SST and PVALB subclasses of neocortical GABAergic 
neurons are known to arise developmentally 
from the medial ganglionic eminence (0, 24, 46). 
Whereas most neurons from these subclasses 
are distinct, neurons from some of the finer 
transcriptomic types in both mice (26) and 
humans (7) have gene expression profiles par- 
tially consistent with both SST and PVALB sub- 
classes. We examined one such type, L24 SST 
FRZB (hereafter called SST FRZB), and explored 
its assignment to either the SST or PVALB sub- 
class. We analyzed snRNA-seq data to identify 
the top 20 differentially expressed genes for the 
SST and PVALB subclasses and plotted expres- 
sion in representative neurons from all SST and 
PVALB transcriptomic types. Whereas neurons 
belonging to the SST FRZB transcriptomic type 
did express SST higher than PVALB, they lacked 
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expression of multiple key genes that define 
each subclass. However, they expressed multi- 
ple other key genes that define each subclass 
(Fig. 3A), confirming that assignment of this 
transcriptomic type to SST or PVALB subclass 
depends on the marker genes selected. Given 
the intermediate nature of the SST FRZB gene 
expression signatures, we sought to gain a bet- 
ter understanding of the phenotypic prop- 
erties of this transcriptomic type that exists 
along a continuum of the PVALB and SST 
subclasses. 

We took advantage of the multimodal ap- 
proach of Patch-seq to examine how the pheno- 
typic properties of SST FRZB are related to 
PVALB and SST subclasses. First, we observed 
that the vast majority of SST FRZB samples lo- 
calized to the PVALB subclass within the tran- 
scriptomic UMAP space in our Patch-seq dataset 
(Fig. 3B), consistent with the localization of 
neurons from the snRNA-seq data in previous 
work (36). Phenotypically, SST FRZB neurons 
resembled mostly the PVALB subclass. Exam- 
ination of all 84 electrophysiological features, 
visualized using a UMAP, revealed that most 
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Fig. 3. Phenotypic alignment of SST FRZB neurons with PVALB subclass. 
(A) Heat map of snRNA-seq data with the top 20 differentially expressed genes 
for SST and PVALB subclass with SST FRZB highlighted in blue. Genes in 
subpanel are key genes for the major subclasses and classes. (B) Transcriptomic 
UMAP highlighting the PVALB and SST subclasses and SST FRZB in blue. 

(C) UMAP representation of electrophysiology space highlighting SST and 
PVLAB subclasses and SST FRZB in blue. (D) Overlaid single action potential 
sweeps from SST FRZB and PVALB subclass; black lines represent the 

mean and are overlaid to the right for direct comparison. (E) UMAP representation 
of electrophysiology space color-coded by upstroke/downstroke ratio and 
action potential width. Box plots to the right show the distribution of data of the 
SST and PVALB subclasses with SST FRZB highlighted in blue. (Kruskal-Wallis 
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of the SST FRZB neurons colocalized with the 
PVALB subclass (Fig. 3C). One defining feature 
of cortical PVALB interneurons is their fast- 
spiking behavior and narrow APs (16, 47, 48). 
We observed the AP kinetics, such as upstroke/ 
downstroke ratio, AP width, and AP firing pat- 
tern of SST FRZB neurons aligned with fast- 
spiking interneurons of the PVALB subclass 
(Fig. 3, D, E, and I). Additionally, these select 
features of the SST subclass were not different 
in recordings from acute versus culture par- 
adigms (fig. S6). 

Next, we investigated the morphological prop- 
erties of SST FRZB neurons and found a better 
overlap with PVALB than with other SST neu- 
rons (Fig. 3F). Morphological features such 
as horizontal axon extent and a measure of the 
dissimilarity of the axon and dendrite com- 
partments (axon versus dendrite earth mover’s 
distance) showed that SST FRZB neurons had 
individual morphological features that were 
more similar to PVALB basket cells than to 
SST neurons (Fig. 3G). To test whether SST FRZB 
neurons would align with PVALB basket cells 
in an unbiased way, we performed hierarchical 


Upstroke/downstroke ratio ae 


ANOVA on ranks; P < 0.05, FDR corrected. Post-hoc Dunn's test; ****P < 
0.0001, FDR corrected). (F) UMAP representation of morphology space with 
highlighting SST and PVLAB subclasses and SST FRZB in blue. (G) Morphology- 
feature box plots for SST, SST FRZB, and PVALB. EMD, earth movers distance 
(Kruskal-Wallis ANOVA on ranks; P < 0.05, FDR corrected. Post-hoc Dunn's 
test; **P < 0.01, FDR corrected). (H) Morphology hierarchical coclustering of 
SST- and PVALB-subclass neurons where values represent the number of 
neurons found in each morphology type. (I) Representative morphologies from 
SST FRZB and PVLAB subclasses shown aligned to an average cortical 
template with associated voltage responses to a 1 s—long current step of -90 pA 
and rheobase +80 pA. Morphology types from (H) are shown above each 
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coclustering on all morphologies in the SST 
and PVALB subclasses. Clustering revealed 
three morphology types (1, 4, and 5) dominated 
by PVALB neurons and two types (3 and 6) 
dominated by neurons belonging to the SST 
subclass. We found that neurons in the SST 
FRZB and SST TH transcriptomic types (a re- 
lated, sparsely sampled deep-layer SST transcrip- 
tomic type that also shows gene expression 
similarities to PVALB) more often clustered with 
PVALB morphology types (Fig. 3H). Juxtaposing 
morphologies of SST FRZB and PVALB neu- 
rons also revealed their similarity to each other, 
including multipolar dendrites and a dense, local 
axon that avoids LI (Fig. 31). Based on these 
collective, multimodal results, we suggest that 
SST FRZB could be considered as part of the 
PVALB subclass rather than SST subclass, de- 
spite higher expression of SST than PVALB for 
most neurons of this type. This assignment has 
been applied for subclass-level analyses and sug- 
gests that collecting multimodal data is critical 
to validate transcriptomic types, particularly for 
cells with not immediately clear or intermediate 
gene expression signatures. 


Morphological heterogeneity within SST CALB1 


We identified dramatic morphological heter- 
ogeneity within the main supragranular SST 
transcriptomic type, L1-3 SST CALBI1 (here- 
after called SST CALB1). In mice, Sst sub- 
class neurons in L2/3 have classic MC shape, 
which is defined as having extensive Ll axon 
(49). In humans, we qualitatively identified four 
morphological types within SST CALB1. Three 
of these types resembled previously described 
neurons: MCs, DBCs, and basket-like cells (BCs). 
We found classic MCs at the L2-L3 border and in 
L3. The MCs had considerable ascending axon 
resulting in an extensive plexus in Ll with com- 
paratively minimal descending axon. DBCs had 
a characteristic ratio of axon to dendrite width, 
which distinguishes these from other SST sub- 
class neurons. We also found one SST CALBI 
neuron with basket-like properties, including 
radially arrayed dendrite and axonal branches 
and minimal LI axon innervation. The fourth 
morphological type, a subset of the MCs, had 
unusually sparse axon compared with MCs, 
and thus we refer to them as “sparse SST” to 
distinguish them from classic MCs (Fig. 4, A 
and B). Sparse SST neurons had somas re- 
stricted to L2, contained axons in LI, and had 
considerable descending axons with one in- 
stance of the axon reaching deep L5 to upper 
L6. Unlike other morphology types within 
SST CALBI, this type had a wide dendrite, long 
axonal extent, and sparsely branching axon 
that plateaued in L1 and L2 (see histograms 
in Fig. 4A). Our dataset contained additional 
SST CALBI1 neurons (fig. S4) that we did not 
categorize into these four morphology types 
because of insufficient axonal information to 
make a confident morphological qualification. 
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Next, we investigated how passive and active 
electrophysiological properties varied for the 
four morphological types of SST CALBI. Despite 
the limited number of neurons, we observed 
potential trends emerging that suggest that 
particular features may correspond with dis- 
tinct morphologically defined SST CALB1 neu- 
rons. For example, sparse SST neurons show a 
lower degree of sag, a higher adaptation ratio, 
and a slightly delayed onset of AP firing as 
compared with DBCs and MCs, whereas the 
peak of the AP is highest in MCs (Fig. 4C). 
These data suggest that a correspondence might 
exist between functional features and mor- 
phology for neurons within the heterogeneous 
SST CALBI transcriptomic type. However, val- 
idation in a larger number of neurons is needed 
to confirm these findings. 

We next sought to identify differentially ex- 
pressed genes associated with the morpho- 
logical heterogeneity within the SST CALB1 
transcriptomic type and determine whether 
neurons could independently be grouped into 
morphological types based on gene expres- 
sion. With this variable gene set, three distinct 
clusters appeared with morphological types 
largely separated across clusters (Fig. 4D). These 
results suggest that there are genetic corre- 
lates of distinct phenotypic properties within 
the SST CALBI type. 


Double bouquet cells 


Among the SST CALB1 neurons, we identified 
the double bouquet cell type, specialized in pri- 
mates (50, 51) but also discovered in other 
specific carnivores (52-54). DBCs are described 
as having a “horse-tail” morphology with as- 
cending and descending narrow axon bundles 
(55). Though this morphology is well accepted 
in the field, the definition of molecular mark- 
ers as well as their transcriptomic-type iden- 
tities remain incompletely resolved (56). We 
found that these classical DBC morphologies 
map to two closely related transcriptomic types, 
SST CALB1, mentioned above, and L3-5 SST 
ADGRG6 (hereafter called SST ADGRG6). DBCs 
from both transcriptomic types had classic 
horse-tail axon collaterals extending down 
to L5 and frequently up to Ll. They often con- 
tained multiple descending axon bundles with 
short perpendicular branches. In addition to 
a Narrow axon signature, we found their den- 
drites to be narrow and either bitufted or multi- 
polar (Fig. 5A). MERFISH analysis to map the 
spatial distributions of all types in the MTG 
taxonomy revealed that the SST CALBI1 tran- 
scriptomic type was mostly restricted to L2, 
whereas SST ADGRG6 was mostly restricted 
to L3 (Fig. 5B), in close alignment with soma 
distributions from Patch-seq experiments. In 
agreement with the differing soma distribu- 
tion patterns, we found a slight shift in their 
axon peak distribution, with SST CALBI1 axon 
shifted more superficially than that of SST 
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ADGRG6 (Fig. 5A, histograms at right). We 
also found that six out of seven DBCs in the 
SST CALB1 transcriptomic type were from the 
temporal cortex, whereas three out of four 
DBCs in the SST ADGRG6 transcriptomic type 
were from the frontal cortex, suggesting a pos- 
sible differential abundance or enrichment 
across the cortex (fig. S7A). 

Given that we found DBCs within two tran- 
scriptomic types and taking into account a 
recent report of variable or inconsistent gene- 
expression signatures for morphologically de- 
fined human DBCs (56), we next sought to 
explore how these results compared to marker- 
gene expression for human DBCs in our data- 
set. Most DBCs showed strong expression of 
CALB1 (calbindin), whereas only one showed 
expression of CALB2 (calretinin). All but one 
neuron showed some degree of SST expression 
and mixed expression of TACI, NOS1, and NPY 
(fig. S7A). With this variable expression and 
limited sample size, we could not identify key 
genes specific to morphologically defined DBCs. 

To examine the functional properties of DBCs, 
we examined the voltage responses to hyper- 
polarizing and depolarizing current. DBCs 
displayed a range of firing patterns from ac- 
commodating and stuttering to irregular spiking 
(Fig. 5C). Furthermore, in the electrophysiolog- 
ical UMAP space, DBCs from the SST CALB1 
and SST ADGRG6 transcriptomic types were 
intermingled with other SST neurons (Fig. 5D, 
middle). Representative electrophysiological 
features showed varying responses across 
individual DBCs and the two transcriptomic 
types. One main feature that was consistent 
and prominent with all DBCs was a higher 
sag ratio (Figs. 4C and 5C), in close agreement 
with the recently reported electrophysiological 
signature of human DBCs (56). 

In morphological UMAP space, we saw DBCs 
from these two transcriptomic types cluster 
near each other with perhaps some separation 
at the transcriptomic-type level (Fig. 5D, right). 
Bouton density measurements did not show 
differences between the two types (Fig. 5E and 
fig. S7B). Therefore, we observed two distinct 
populations of human DBCs that were pheno- 
typically virtually indistinguishable but had dif- 
ferent areal enrichment and mapped to specific 
SST transcriptomic types occupying distinct 
laminar positions in the cortical depth. 


Human versus mouse 


Species divergence in cortical GABAergic 
neuronal properties could underlie putative 
species-specific cognitive abilities and may 
have important implications for translational 
research. For example, prior work has con- 
sistently identified higher hyperpolarization- 
activated cyclic nucleotide-gated (HCN)-channel 
expression and function in human versus mouse 
cortical neuron types (57, 58), which is of po- 
tential clinical relevance given that antiepileptic 
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Fig. 4. Morphological heterogeneity within SST CALB1 transcriptomic type. 
(A) Morphologies from SST CALBI1 t-type categorized by qualitative morphology 
type shown aligned to an average cortical template, with histograms to the 
right of the morphologies displaying average dendrite (darker color) and axon 
(lighter color) branch length by cortical depth (shading shows +/- 1 SD about 
mean; soma locations are represented by black circles). Voltage responses to a 
1 s-long current step to a -90 pA and rheobase +80 pA are shown below. Box 
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plots representing key morphological (B) and electrophysiological (C) features 
by morphology type (Kruskal-Wallis ANOVA on ranks; P < 0.05, FDR corrected. 
Post-hoc Dunn's test; *P < 0.05, **P < 0.01, FDR corrected). (D) UMAP of 

all SST CALB1 neurons based on 253 genes differentially expressed between 
10 DBC, MC, and sparse SST neurons. Neurons cluster into three main groups 
corresponding to morphology types (colored markers). Neurons with unknown 
morphologies are represented by gray markers. 
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—90 pA and rheobase +80 pA for each respective DBC. (B) Spatial distribution for SST CALB1 and SST ADGRG6 revealed by MERFISH. (C) Four specific electrophysiology 
features shown as box or scatter plots for the SST CALB1 and SST ADGRG6 t-types with putative DBCs highlighted in green. (D) UMAP representation of transcriptomics 
(isolated to the SST subclass), electrophysiology, and morphology. SST CALB1 and SST ADGRG6 t-types are colored orange and brown, respectively, and qualitatively 
defined putative DBCs are highlighted in green. (E) 63x MIP inset corresponding to Fig. 4A showing axon horse-tail bundles and boutons. 


drugs such as lamotrigine have been shown 
to activate HCN channels in dendrites (59). 
To directly explore the degree of conservation 
of cortical GABAergic subclasses from mice to 
humans, we compared our data with a previ- 
ously published and comprehensive Patch-seq 
dataset of cortical GABAergic neurons in the 
mouse primary visual cortex (J6). In previous 
work (J), transcriptomic types in humans and 
mice were aligned based on shared gene ex- 
pression covariation to generate a set of genes 
to define mouse and human homologous cell 
types. We focused on supragranular neurons 
of two homologous types: (i) Pvalb 2, which 
consisted of human transcriptomic types PVALB 
WFDC2 and L4-6 PVALB SULF1 and mouse 
transcriptomic types Pvalb Calbl Sst, Pvalb 
Reln Itm2a, Pvalb Reln Tacl, and Pvalb Sema3e 
Kank4, and (ii) Sst 5, which consisted of human 
SST CALB1 and mouse Sst Calb2 Necabl, Sst 
Calb2 Pdlim5, Sst Mme Famll4al, Sst Tacl 
Htrld, and Sst Tacl Tacr3. 

Both Pvalb 2 and Sst 5 exhibited variations in 
morphological features across species (Fig. 6B). 
Human neurons had less axonal and dendritic 
branching yet occupied a larger spatial extent. 
We found no variation in axon total length 
between the homologous types. Taken together, 
these findings indicate that mouse neurons have 
a denser axonal plexus in closer proximity to 
their soma. In the specific case of Sst 5, several 
of the morphology features that differ across 
species reflect the change from L2/3 Martinotti 
neurons in mice to a more diverse set of L2 and 
L3 human SST morphologies that have drastical- 
ly different axonal shape compared to mouse. 
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A variety of single-neuron electrophysiolog- 
ical features were found to be distinct between 
mice and humans. In the electrophysiological 
UMAP, both Pvalb 2 and Sst 5 showed clear 
separation between humans and mice (Fig. 6C). 
Average traces from each type and species 
shown in Fig. 6D and quantified in Fig. 6E 
highlight the differences in sag, rheobase, and 
AP half-width. Features related to AP firing 
such as average firing rate and AP frequency- 
current curve slope were found to differ across 
species for the Pvalb 2 but not for the Sst 5 
homologous type (Fig. 6D and fig. S8B). Given 
that we previously established that select elec- 
trophysiological features (i.e., sag and AP width) 
of PVALB WFDC2 were different between the 
acute and culture paradigms, and given that all 
mouse data in this analysis was derived ex- 
clusively from acute brain slices, we sought to 
evaluate to what extent the slice preparation 
method could explain any observed species 
differences within Pvalb 2 or Sst 5 homolo- 
gous types. Only the species difference in AP 
width for Pvalb 2 was driven primarily by the 
culture condition, whereas all other species 
differences in electrophysiological features 
were robustly detected irrespective of acute 
versus culture paradigm (fig. S9). 


Discussion 


The diversity of cortical interneurons has posed 
a major challenge to classify and character- 
ize their defining properties. Our work lever- 
aged a human MTG cell type taxonomy (J) as a 
reference to map the transcriptomic identi- 
ties of patch clamp-recorded human cortical 
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GABAergic neurons in ex vivo brain slices. 
This taxonomy contains 45 GABAergic neu- 
ronal transcriptomic types across four canon- 
ical interneuron subclasses (PVALB, SST, VIP, 
and LAMP5/PAX6). In the present study, we 
combined viral genetic labeling with Patch-seq 
to target human cortical GABAergic neurons. 
We performed Patch-seq experiments on 778 
human cortical GABAergic neurons and ag- 
gregated our data at the subclass level to un- 
cover the signature morphoelectric properties 
of these canonical human interneuron sub- 
classes, and where feasible, dove deeper into 
features of select GABAergic transcriptomic 
types. Although many of the GABAergic tran- 
scriptomic types were not sampled at sufficient 
depth to conclude their defining properties 
in this study, our dataset achieved coverage 
of 44: of the 45 GABAergic transcriptomic types, 
many of which represent rare cell types. 

We characterized the changes that occured in 
this short-term slice culture and viral-labeling 
paradigm in multiple data modalities including 
gene expression and electrophysiological and 
morphological features (where sampling depth 
permitted). Although discrete changes were 
evident and described in electrophysiology, 
these putative culture differences did not pre- 
clude or hinder high confidence mapping of 
recorded neurons to GABAergic subclasses 
and transcriptomic types in the human MTG 
taxonomy. This is well explained by the fact 
that marker genes important for mapping at 
the GABAergic subclass level were robustly 
detected overall and minimally impacted by 
slice culture or viral transduction. Additionally, 
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upon further analysis of voltage-gated ion- 
channel genes, expression of 7 out of 114 genes 
was different between the two paradigms. Three 
of the seven were genes that code for sodium 
channel subunits, which may explain some of 
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the differences we observed in select features 
such as AP width. We identified a gene ex- 
pression signature in a subset of our dataset 
that corresponded to a module of microglia- 
related genes. This identical signature was 


3 


observed for both neurons recorded in acute 
and cultured slices (but is more pronounced in 
culture) and has been similarly observed in other 
human and mouse acute brain slice Patch- 
seq datasets (15-17, 36, 37). Further work is 
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warranted to understand the cause and con- 
sequences of this gene expression signature 
in brain slice experiments. 

Viral genetic targeting with Patch-seq ex- 
periments enhanced the coverage of transcrip- 
tomic types in the SST subclass and allowed us 
to aggregate a sizable dataset of neurons that 
map to SST CALBI. These neurons exhibited 
high heterogeneity in morphoelectric features 
that might seem incongruent with the map- 
ping outcome to a single transcriptomic type 
in the taxonomy. At least three discrete mor- 
phological types were recognized including 
DBCs with classical descending long horse- 
tail axon bundles (55), MCs with ascending 
axons reaching LI, and a type we labeled sparse 
SST for the sparsely branched and lengthy axon 
segments. Further grouping by morphology 
type within this SST CALBI transcriptomic type 
revealed strong coherence of electrophysiolog- 
ical features with many distinctive features 
between groups. These results suggest that 
further splitting of the SST CALBI transcrip- 
tomic type may be warranted but will require 
additional studies to fully resolve, presum- 
ably requiring deeper multimodal phenotypic 
characterization. 

Another important outcome from our study 
is the delineation of the transcriptomic types 
containing human DBCs. In addition to SST 
CALB1 transcriptomic-type mapping, we also 
observed DBCs mapping to a second transcrip- 
tomic type in the SST subclass, SST ADGRG6. 
DBCs mapping to SST CALB1 were predomi- 
nantly found in the temporal cortex, whereas 
DBCs mapping to SST ADGRG6 were enriched 
in the frontal cortex, which suggests possible 
regional differences in abundance of these pu- 
tative DBC subtypes across the human cortex. 
Furthermore, spatial transcriptomics revealed 
a Clear shift in laminar distribution between 
the two transcriptomic types, with SST CALB1 
soma distribution mostly restricted to L2, and 
SST ADGRG6 soma distribution mostly re- 
stricted to L3. DBCs sampled from Patch-seq 
experiments follow this trend of distinct lam- 
inar positioning of the two transcriptomic 
types. We were not able to discriminate DBCs 
versus non-DBCs within these transcriptomic 
types using current available spatial method- 
ologies. Nonetheless, these data are consistent 
with the existence of at least two subtypes of 
human DBCs. Whether these transcriptomi- 
cally defined subtypes align to previously re- 
ported DBC subtypes in the human cortex (54) 
or if there exist functional specializations be- 
tween these subtypes remains to be deter- 
mined. Our findings corroborate and extend 
recent work (56) and provide important clues 
about the molecular identity, synaptic con- 
nectivity, and function of human DBCs. The 
overall picture from these two independent 
studies is that human DBCs are more hetero- 
geneous than previously understood. 
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A limitation of the current dataset was the 
low number of neurons with classic Martinotti 
morphologies in the SST subclass. This may be 
the result of biased sampling of supragranular 
layers in our experiments, whereas MCs could 
instead be more abundant in infragranular 
layers of the human neocortex. The abundance 
and spatial distribution of MCs in the human 
cortex has not been detailed to our knowledge. 
We observed denser GABAergic neuron label- 
ing in L2 and L3 with the CN1390 GABAergic 
enhancer AAV vector, which suggests possible 
enrichment for GABAergic neuron types most 
abundant in these layers (i.e., SST CALB1). 
However, other neuron types with expected 
enrichment in supragranular layers such as 
chandelier cells and many diverse transcrip- 
tomic types of the VIP subclass were poorly 
sampled in this study. The underlying ex- 
planation remains unclear, but one possibil- 
ity may be laminar shifts in these cell types 
in the human versus rodent cortex in light of 
prior findings of laminar shifts, or variation by 
cortical area in cell-type marker genes in the 
mouse versus human neocortex (60). Changes 
in the abundance or laminar distribution of 
cell types can be resolved with spatial tran- 
scriptomics approaches applied in mouse ver- 
sus human brain samples or by comparing 
across diverse mammalian species (J, 18). For 
example, chandelier cells are known to be en- 
riched at the L1-L2 border in the mouse cortex 
(61) but instead show enrichment at the L3-L4 
border in the human temporal cortex (7). Rapid 
progress in single-cell epigenetic data genera- 
tion (62-65) and continued discovery of brain 
cell type-specific enhancers suitable for use in 
viral vectors (35, 66-68) are likely to be the key 
to targeted analysis of these and other impor- 
tant human cortical interneuron types. Indeed, 
we have demonstrated proof of concept for im- 
proved sampling of VIP subclass neurons in 
human neocortical slices using an enhancer 
AAV vector. Because of the wide diversity of 
transcriptomic types in the VIP subclass and 
the limited sampling achieved in our dataset 
by using the more general GABAergic label- 
ing approach, viral tools such as this are likely 
to hasten the progress for deeper multimodal 
analysis of poorly sampled human VIP tran- 
scriptomic types. 

Perhaps the most impactful finding of our 
study is the direct demonstration of how multi- 
modal Patch-seq data are vital to refinement 
of transcriptomic cell type taxonomies. Based 
only on transcriptomics, no clear assignment 
of SST FRZB to SST or PVALB subclass was 
possible, and this inherent ambiguity is likely 
rooted in the shared developmental origin of 
SST and PVALB subclasses (46). However, our 
electrophysiology and morphology results clear- 
ly resolve that the phenotypic properties of 
SST FRZB transcriptomic type are most par- 
simonious with PVALB subclass assignment 
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in the human neocortical cell type taxonomy. 
This underscores the notion that cellular tax- 
onomies built on single-cell transcriptomes 
and differential gene expression are not static, 
but rather represent a starting foundation to 
build upon as new data modalities are obtained 
and aligned at the resolution of transcriptomic 
types. Thus, transcriptomic cell type taxon- 
omies are necessary but not always sufficient 
to infer meaningful functional types with high 
accuracy. Alignment of multimodal data such 
as spatial distribution and abundance, cellular 
morphology, axonal projections and connectiv- 
ity, neuromodulation, and intrinsic and synaptic 
electrophysiological properties will be essen- 
tial to refine and extend foundational cellular 
taxonomies of the brain. 

In this study, we have provided a first-order 
characterization of the signature morphoelec- 
tric properties of the canonical human cortical 
GABAergic neuron subclasses and select tran- 
scriptomic types. Despite the apparent conserva- 
tion of canonical cortical GABAergic subclasses, 
we uncovered previously unrecognized species 
differences in fundamental morphological and 
electrophysiological features of neocortical PVALB 
and SST homologous types. Precisely how such 
anatomical and functional differences contrib- 
ute to specific human cognitive abilities or to 
selective vulnerability of discrete neuron types 
in disease remains to be determined. This work 
provides a promising roadmap for future func- 
tional studies of human brain cell types at the 
resolution of emerging transcriptomic cell type 
taxonomies and provides a rich open-access 
dataset for exploring gene-function relation- 
ships for a wide diversity of human neocortical 
GABAergic neuron types. 


Materials and Methods 
Human tissue acquisition 


Surgical specimens were obtained from local 
hospitals (Harborview Medical Center, Swedish 
Medical Center, and University of Washington 
Medical Center) in collaboration with local 
neurosurgeons. Data included in this study 
were obtained from neurosurgical tissue resec- 
tions for the treatment of refractory temporal 
lobe epilepsy or deep brain tumor. All patients 
provided informed consent and experimental 
procedures were approved by hospital institute 
review boards before commencing the study. 
Tissue was placed in slicing artificial cerebral 
spinal fluid (ACSF) as soon as possible follow- 
ing resection. Slicing ACSF comprised (in mM): 
92 N-methyl-D-glucamine chloride (NMDG- 
Cl), 2.5 KCl, 12 NaH,PO,, 30 NaHCO,, 20 4-(2- 
hydroxyethyl)-1-piperazineethanesulfonic acid 
(HEPES), 25 D-glucose, 2 thiourea, 5 sodium- 
L-ascorbate, 3 sodium pyruvate, 0.5 CaCl>.4H.O, 
and 10 MgSO,.7H.O. Before use, the solution 
was equilibrated with 95% Os, 5% CO. and the 
pH was adjusted to 7.3 by addition of 5N HCl 
solution. Osmolality was verified to be between 
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295 to 305 mOsm kg". Human surgical tissue 
specimens were immediately transported (15 
to 35 min) from the hospital site to the lab- 
oratory for further processing. 


Human neurosurgical specimens and 
ethical compliance 


The neurosurgical tissue specimens collected 
for this study were apparently non-pathological 
tissues removed during the normal course of 
surgery to access underlying pathological tis- 
sues. Tissue specimens were determined to be 
nonessential for diagnostic purposes by medical 
staff and would have otherwise been discarded. 
Tissue procurement from neurosurgical do- 
nors was performed outside of the supervision 
of the Allen Institute at a local hospital and 
tissue was provided to the Allen Institute un- 
der the authority of the institutional review 
board of the participating hospital. A hospital- 
appointed case coordinator obtained informed 
consent from the donor before surgery. Tissue 
specimens were deidentified before receipt by 
Allen Institute personnel. 


Tissue processing 


Human acute and cultured brain slices (850 um) 
were prepared with a Compresstome VF-300 
(Precisionary Instruments) modified for block- 
face image acquisition (Mako G125B PoE cam- 
era with custom integrated software) before 
each section to aid in registration to the com- 
mon reference atlas. Brains or tissue blocks 
were mounted to preserve intact pyramidal 
neuron apical dendrites within the brain slice. 
Slices were transferred to a carbogenated (95% 
O./5% CO.) and warmed (34°C) slicing ACSF 
to recover for 10 min according to the NUDG 
protective recovery method (69). Acute brain 
slices were then transferred to room temper- 
ature holding ACSF of the composition (in mM): 
92 NaCl, 2.5 KCl, 1.2 NaH»PO,, 30 NaHCOs, 
20 HEPES, 25 D-glucose, 2 thiourea, 5 sodium- 
L-ascorbate, 3 sodium pyruvate, 2 CaCl5.4H,O 
and 2 MgSO,.7H.O for the remainder of the 
day until transferred for patch clamp record- 
ings. Before use, the solution was equilibrated 
with 95% Os, 5% CO. and the pH was adjusted 
to 7.3 using NaOH. Osmolality was verified to 
be between 295 to 305 mOsm kg”. Alternate- 
ly, slices for interface culture were placed onto 
membrane inserts (Millipore) in 6 well plates 
with 1 mL per well of slice culture media of 
the composition: 8.4 ¢/L MEM Eagle medium, 
20% heat-inactivated horse serum, 30 mM 
HEPES, 13 mM d-glucose, 15 mM NaHCOs, 
1 mM ascorbic acid, 2 mM MgSO,-7H,O, 1 mM 
CaCl,.4H.O, 0.5 mM GlutaMAX-I and 1 mg/L 
insulin. The slice culture medium was carefully 
adjusted to pH 7.2 to 7.3 and osmolality of 300 
to 310 mOsmoles per kilogram by addition of 
pure H20, sterile-filtered and stored at 4°C for 
up to 2 weeks. Culture plates were placed in a 
humidified 5% COs, incubator at 35°C, and the 
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slice culture medium was replaced every two to 
three days until endpoint analysis. One to three 
hours after brain slices were plated on cell cul- 
ture inserts, brain slices were infected by direct 
application of concentrated AAV viral particles 
over the slice surface (34). 


AAV vector cloning and viral packaging 


The DLX2.0 enhancer consists of 3 concate- 
nated copies of the DLX core enhancer region 
(70). The ultraconserved 131 bp core DLX en- 
hancer region has 100% linear sequence iden- 
tity between mouse and human. The DLX2.0 
fragment was custom gene synthesized and sub- 
cloned by standard restriction enzyme digestion 
and ligation into a recombinant AAV vector 
backbone upstream of the beta globin mini- 
mal promoter, SYFP2 reporter transgene, short 
woodchuck hepatitis virus posttranscriptional 
regulatory element (WPRE3) and bovine growth 
hormine (BGH) polyA sequence to produce AAV 
vector CN1390. The CN1390 plasmid map as 
well as plasmid DNA and PHP.eB serotype viral 
aliquots are available from Addgene (plasmid 
#163505). The VIP enhancer candidate eHGT_354h 
was selected using previously published hu- 
man single cell ATAC-seq data (35) based on 
proximity to the surrogate VIP subclass marker 
gene CALB2 and accessible ATAC-seq peak 
exclusively in the VIP subclass versus all other 
cortical cell type subclasses in the dataset. The 
candidate genomic enhancer was PCR ampli- 
fied from human genomic DNA and cloned by 
standard restriction enzyme digestion and liga- 
tion into a recombinant AAV vector backbone 
as mentioned above (but containing a custom 
designed 3XSP10 insulator sequence 5’-gaag- 
ctacccctaacacactattctacacacagaaaatgctcttcac- 
taggaagctacccctaacacactattctacacacagaaaatgct 
cttcactaggaagctacccctaacacactattctacacacagaa- 
aatgctcttcactag-3’ upstream of the enhancer po- 
sition) to yield AAV vector CN2039. The CN2039 
plasmid map as well as plasmid DNA will be 
available from Addgene (plasmid #208401). 
Enhancer AAV plasmids CN1390 and CN2039 
were Maxi-prepped and transfected with PEI Max 
40K (Polysciences Inc., catalog # 24'765-1) into 
one 15-cm plate of AAV-293 cells (Cell Biolabs 
catalog # AAV-100), along with helper plasmid 
pHelper (Cell BioLabs) and PHP.eB rep/cap 
packaging plasmid (Chan et al., 2017), with a to- 
tal mass of 150 ug PEI Max 40K, 30 ug pHelper, 
15 ug rep/cap plasmid, and 15 ug enhancer-AAV 
vector. The next day medium was changed to 
1% FBS, and then after 5 days, cells and super- 
natant were harvested and AAV particles re- 
leased by three freeze-thaw cycles. Lysate was 
treated with benzonase after freeze thaw to 
degrade free DNA (2 uL benzonase, 30 min at 
37 degrees, MilliporeSigma catalog # E8263- 
25KU), and then cell debris was precleared 
with low-speed spin (1500 g 10 min), and fi- 
nally the crude virus was concentrated over 
a 100 kDa molecular weight cutoff Centricon 
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column (MilliporeSigma catalog # Z648043) to 
a final volume of ~150 uL. For highly purified 
large-scale preps this protocol was altered so 
that ten plates were transfected and harvested 
together at 3 days after transfection, and then 
the crude virus was purified by iodixanol gra- 
dient centrifugation. 


Patch clamp recording 


Slices were continuously perfused (2 mL/min) 
with fresh, warm (34°C) recording ACSF con- 
taining the following Gin mM): 126 NaCl, 2.5 KCl, 
1.25 NaH»PO,4, 26 NaHCOs, 12.5 D-glucose, 
2 CaCl».4H.O and 2 MgSO,.7H».O (pH 7.3) and 
continuously bubbled with 95 % O./5% COs. The 
bath solution contained blockers of fast gluta- 
matergic (1 mM kynurenic acid) and GABAergic 
synaptic transmission (0.1 mM picrotoxin). Thick- 
walled borosilicate glass (Warner Instruments, 
GI150F-3) electrodes were manufactured (Narishige 
PC-10) with a resistance of 4-5 MQ. Before 
recording, the electrodes were filled with ~1.0 to 
1.5 wL of internal solution with biocytin [10 mM 
potassium gluconate, 10.0 mM HEPES, 0.2 mM 
ethylene glycol-bis (2-aminoethylether)-N,N,N’, 
N’tetraacetic acid, 4 mM potassium chloride, 
0.3 mM guanosine 5’-triphosphate sodium salt 
hydrate, 10 mM phosphocreatine disodium salt 
hydrate, 1 mM adenosine 5’-triphosphate mag- 
nesium salt, 20 ug/mL glycogen, 0.5 U/uL RNAse 
inhibitor (Takara, 2313A) and 0.5% biocytin 
(Sigma B4261), pH 7.3]. The pipette was mounted 
on a Multiclamp 700B amplifier headstage (Mo- 
lecular Devices) fixed to a micromanipulator 
(PatchStar, Scientifica). 

The composition of bath and internal solu- 
tion as well as preparation methods were chosen 
to maximize the tissue quality of slices from 
adult mice, to align with solution compositions 
typically used in the field (to maximize the chance 
of comparison to previous studies), modified 
to reduce RNAse activity and ensure maximal 
gain of mRNA content. 

Electrophysiology signals were recorded using 
an ITC-18 Data Acquisition Interface (HEKA). 
Commands were generated, signals processed, 
and amplifier metadata were acquired using 
MIES written in Igor Pro (Wavemetrics). Data 
were filtered (Bessel) at 10 kHz and digitized 
at 50 kHz. Data were reported uncorrected for 
the measured (Neher, 1992)-14 mV liquid junc- 
tion potential between the electrode and bath 
solutions. 

Prior to data collection, all surfaces, equip- 
ment, and materials were thoroughly cleaned 
in the following manner: a wipe down with DNA 
away (Thermo Scientific), RNAse Zap (Sigma- 
Aldrich), and finally nuclease-free water. After 
formation of a stable seal and break-in, the 
resting membrane potential of the neuron was 
recorded (typically within the first minute). A 
bias current was injected, either manually or 
automatically using algorithms within the MIES 
data acquisition package, for the remainder of 
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the experiment to maintain that initial resting 
membrane potential. Bias currents remained 
stable for a minimum of 1s before each stimulus 
current injection. 

To be included in analysis, a neuron needed 
to have a > 1 GQ seal recorded before break-in 
and an initial access resistance <20 MQ and 
<15% of the Rinput. To stay below this access 
resistance cut-off, neurons with a low input 
resistance were successfully targeted with 
larger electrodes. For an individual sweep to be 
included, the following criteria were applied: 
(i) the bridge balance was <20 MQ and <15% 
of the Rinput; (11) bias (leak) current 0 + 100 pA; 
and (iii) root mean square noise measurements 
in a short window (1.5 ms, to gauge high fre- 
quency noise) and longer window (500 ms, 
to measure patch instability) were <0.07 and 
0.5 mV, respectively. 

Upon completion of electrophysiological ex- 
amination, the pipette was centered on the 
soma or placed near the nucleus (if visible). 
A small amount of negative pressure was ap- 
plied (~-30 mbar) to begin cytosol extrac- 
tion and attract the nucleus to the tip of the 
pipette. After approximately one minute, the 
soma had visibly shrunk and/or the nucleus 
was near the tip of the pipette. While main- 
taining the negative pressure, the pipette was 
slowly retracted in the x and z direction. Slow, 
continuous movement was maintained while 
monitoring pipette seal. Once the pipette seal 
reached >1 GQ and the nucleus was visible on 
the tip of the pipette, the speed was increased 
to remove the pipette from the slice. The pi- 
pette containing internal solution, cytosol, and 
nucleus was removed from the pipette holder 
and contents were expelled into a PCR tube 
containing lysis buffer (Takara, 634894). 


Transcriptomic Data Collection 
cDNA amplification and library construction 


We used the SMART-Seq v4 Ultra Low Input 
RNA Kit for Sequencing (Takara, 634894) to 
reverse transcribe poly(A) RNA and amplify 
full-length cDNA according to the manufac- 
turer’s instructions. We performed reverse tran- 
scription and cDNA amplification for 20 PCR 
cycles in 0.65 ml tubes, in sets of 88 tubes at a 
time. At least one control eight-strip was used 
per amplification set, which contained four 
wells without cells and four wells with 10 pg 
control RNA. Control RNA was either Univer- 
sal Human RNA (UHR) (Takara 636538) or 
control RNA provided in the SMART-Seq v4 
kit. All samples proceeded through Nextera 
XTDNA Library Preparation (lumina FC-131- 
1096) using either Nextera XT Index Kit V2 
Sets A-D (FC-131-2001, 2002, 2003, 2004) or 
custom dual-indexes provided by IDT (Inte- 
grated DNA Technologies). Nextera XT DNA 
Library prep was performed according to man- 
ufacturer’s instructions, except that the vol- 
umes of all reagents including cDNA input 
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were decreased to 0.2 x by volume. Each sam- 
ple was sequenced to approximately 500 k reads. 


RNA-sequencing 


Fifty-base-pair paired-end reads were aligned 
to GRCm38 (mm10) using a RefSeq annota- 
tion eff file retrieved from NCBI on 18 January 
2016 (https://www.ncbi.nlm.nih.gov/genome/ 
annotation_euk/ all). Sequence alignment was 
performed using STAR v2.5.3 (71) in two pass 
Mode. PCR duplicates were masked and removed 
using STAR option “bamRemoveDuplicates”. 
Only uniquely aligned reads were used for gene 
quantification. Gene counts were computed 
using the R Genomic Alignments package (72). 
Overlaps function using “IntersectionNotEmpty” 
mode for exonic and intronic regions sepa- 
rately. Exonic and intronic reads were added 
together to calculate total gene counts; this 
was done for both the reference dissociated 
cell data set and the Patch-seq data set of this 
study. 


SMART-seq v4 RNA-sequencing 


The SMART-Seq v4 Ultra Low Input RNA Kit 
for Sequencing (Takara #634894) was used per 
the manufacturer’s instructions. Standard con- 
trols were processed with each batch of ex- 
perimental samples as previously described. 
After reverse transcription, cDNA was ampli- 
fied with 21 PCR cycles. The NexteraXT DNA 
Library Preparation (Illumina FC-131-1096) kit 
with NexteraXT Index Kit V2 Sets A-D (FC-131- 
2001, 2002, 2003, or 2004) was used for se- 
quencing library preparation. Libraries were 
sequenced on an I]Jumina HiSeq 2500 instru- 
ment (Illumina HiSeq 2500 System, RRID: 
SCR_016383) using Illumina High Output V4 
chemistry. The following instrumentation 
software was used during data generation work- 
flow; SoftMax Pro v6.5; VWorks v11.3.0.1195 
and v13.1.0.1366; Hamilton Run Time Control 
v4.4.0.7740; Fragment Analyzer v1.2.0.11; Mantis 
Control Software v3.9.7.19. 


SMART-segq v4 gene expression quantification 


Raw read (fastq) files were aligned to the GRCh38 
human genome sequence (Genome Reference 
Consortium, 2011) with the RefSeq transcriptome 
version GRCh38.p2 (RefSeq, RRID:SCR_003496, 
current as of 4/13/2015) and updated by remov- 
ing duplicate Entrez gene entries from the gtf 
reference file for STAR processing. For alignment, 
Illumina sequencing adapters were clipped from 
the reads using the fastqMCF program (from 
ea-utils). After clipping, the paired-end reads 
were mapped using Spliced Transcripts Align- 
ment to a Reference (STAR v2.7.3a, RRID:SCR_ 

015899) using default settings. Reads that did 
not map to the genome were then aligned to syn- 
thetic construct (that is ERCC) sequences and the 
E. coli genome (version ASM584v2). Quantifi- 
cation was performed using summerizeOverlaps 
from the R package GenomicAlignments v1.18.0. 
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Expression counts were calculated as counts 
per million (CPM) of exonic plus intronic reads. 


Anatomical Annotations 
Layer annotation and alignment 


To characterize the position of biocytin-labeled 
cells, a 20x brightfield and fluorescent image 
of DAPI (4’,6-diamidino-2-phenylindole) stained 
tissue was captured and analyzed to determine 
layer position. Using the brightfield and DAPI 
image, soma position and laminar borders were 
manually drawn for all neurons and were used 
to calculate depth relative to the pia, white matter, 
and/or laminar boundaries. Laminar locations 
were calculated by finding the path connect- 
ing pia and white matter that passed through 
the cell’s soma coordinate, and measuring dis- 
tance along this path to laminar boundaries, 
pia and white matter. 

For reconstructed neurons, laminar depths 
were calculated for all segments of the mor- 
phology, and these depths were used to create 
a “layer-aligned” morphology by first rotating 
the pia-to-WM axis to vertical, then projecting 
the laminar depth of each segment onto an av- 
erage cortical layer template. 


Human brain region pinning 


Available surgical photodocumentation (MRI 
or brain model annotation) is used to place the 
human tissue blocks in approximate 3D space by 
matching the photodocumentation to a MRI 
reference brain volume “ICBM 2009b Nonlinear 
Symmetric” (73), with Human CCF overlayed 
(74) within the ITK-SNAP interactive software. 


Morphological Reconstruction 


A horseradish peroxidase (HRP) enzyme reac- 
tion using diaminobenzidine (DAB) as the 
chromogen was used to visualize the filled cells 
after electrophysiological recording, and 4,6- 
diamidino-2-phenylindole (DAPI) stain was 
used to identify cortical layers as described 
previously (75). 


Imaging of biocytin-labeled neurons 


Mounted sections were imaged as described 
previously (48). In brief, operators captured im- 
ages on an upright AxioImager Z2 microscope 
(Zeiss, Germany) equipped with an Axiocam 
506 monochrome camera and 0.63x Optivar 
lens. Two-dimensional tiled overview images 
were captured with a 20x objective lens (Zeiss 
Plan-NEOFLUAR 20~/0.5) in bright-field trans- 
mission and fluorescence channels. Tiled im- 
age stacks of individual cells were acquired at 
higher resolution in the transmission channel 
only for the purpose of automated and manual 
reconstruction. Light was transmitted using 
an oil-immersion condenser (1.4 NA). High- 
resolution stacks were captured with a 63x 
objective lens (Zeiss Plan-Apochromat 63x/ 
1.4 Oil or Zeiss LD LCI Plan-Apochromat 63x/ 
1.2 Imm Corr) at an interval of 0.28 um (1.4 NA 
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objective) or 0.44 um (1.2 NA objective) along 
the z axis. Tiled images were stitched in ZEN 
software and exported as single-plane TIFF files. 


Morphological reconstruction 


Reconstructions of the dendrites and the axon 
were generated for a subset of neurons with 
good quality transcriptomics, electrophys- 
iology and biocytin fill. Prior to prioritizing a 
neuron for reconstruction, the location of the 
axon exiting the soma or dendrite is deter- 
mined by examining the neuron in the z-depth 
of the 63x image stack. We then assess the 
direction in which the axon is heading relative 
to the surface of the slice. Based on this as- 
sessment, we prioritize the reconstruction of 
neurons whose axons remain largely within 
the plane of the slice. Reconstructions were 
generated based on a 3D image stack that was 
run through a Vaa3D-based image processing 
and reconstruction pipeline (Peng et al., 2010). 
For some neurons images were used to gener- 
ate an automated reconstruction of the neuron 
using TReMAP (Zhou et al., 2016). Alterna- 
tively, initial reconstructions were created 
manually using the reconstruction software 
PyKNOSSOS (https://www.ariadne.ai/) or the 
citizen neuroscience game Mozak (Roskams 
and Popovié, 2016) (https://www.mozak.science/). 
Automated or manually initiated reconstruc- 
tions were then extensively manually cor- 
rected and curated using a range of tools (for 
example, virtual finger and polyline) in the 
Mozak extension (Zoran Popovic, Center for 
Game Science, University of Washington) of 
Terafly tools (75, 76) in Vaa3D. Every attempt 
was made to generate a completely connected 
neuronal structure while remaining faithful to 
image data. If axonal processes could not be 
traced back to the main structure of the neu- 
ron, they were left unconnected. 


MERFISH data generation 


Human postmortem frozen brain tissue was 
embedded in Optimum Cutting Temperature 
medium (VWR,25608-930) and sectioned on a 
Leica cryostat at -17°C at 10 um onto Vizgen 
MERSCOPE coverslips. These sections were 
then processed for MERSCOPE imaging ac- 
cording to the manufacturer’s instructions. 
Briefly: sections were allowed to adhere to 
these coverslips at room temperature for 10 min 
prior to a 1 min wash in nuclease-free phosphate 
buffered saline (PBS) and fixation for 15 min in 
4% paraformaldehyde in PBS. Fixation was 
followed by three 5 min washes in PBS prior 
to a1 min wash in 70% ethanol. Fixed sections 
were then stored in 70% ethanol at 4C prior to 
use and for up to one month. Human sections 
were photobleached using a 150W LED array 
for 72 hours at 4°C prior to hybridization then 
washed in 5 ml Sample Prep Wash Buffer 
(VIZGEN 20300001) in a5 cm petri dish. Sec- 
tions were then incubated in 5 ml Formamide 
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Wash Buffer (VIZGEN 20300002) at 37°C for 
30 min. Sections were hybridized by placing 
50 ul of VIZGEN-supplied Gene Panel Mix onto 
the section, covering with parafilm and incu- 
bating at 37°C for 36 to 48 hours in a humidified 
hybridization oven. Following hybridization, 
sections were washed twice in 5 ml Formamide 
Wash Buffer for 30 min at 47°C. Sections were 
then embedded in acrylamide by polymeriz- 
ing VIZGEN Embedding Premix (VIZGEN 
20300004) according to the manufacturer’s 
instructions. Sections were embedded by in- 
verting sections onto 110 yl of Embedding Premix 
and 10% Ammonium Persulfate (Sigma A3678) 
and TEMED (BioRad 161-0800) solution applied 
to a Gel Slick (Lonza 50640) treated 2x3 glass 
slide. The coverslips were pressed gently onto 
the acrylamide solution and allowed to polym- 
erize for 1.5h. Following embedding, sections 
were cleared for 24: to 48 hours with a mixture of 
VIZGEN Clearing Solution (VIZGEN 20300003) 
and Proteinase K (New England Biolabs P8107S) 
according to the Manufacturer’s instructions. 
Following clearing, sections were washed twice 
for 5 min in Sample Prep Wash Buffer (PN 
20300001). VIZGEN DAPI and PolyT Stain (PN 
20300021) was applied to each section for 15 min 
followed by a 10 min wash in Formamide Wash 
Buffer. Formamide Wash Buffer was removed 
and replaced with Sample Prep Wash Buffer 
during MERSCOPE set up. 100 ul of RNAse 
Inhibitor (New England BioLabs M0314L) was 
added to 250 ul of Imaging Buffer Activator 
(PN 203000015) and this mixture was added 
through the cartridge activation port to a pre- 
thawed and mixed MERSCOPE Imaging car- 
tridge (VIZGEN PN1040004). 15 ml mineral oil 
(Millipore-Sigma m5904-6X500ML) was added 
to the activation port and the MERSCOPE fluidics 
system was primed according to VIZGEN instruc- 
tions. The flow chamber was assembled with the 
hybridized and cleared section coverslip ac- 
cording to VIZGEN specifications and the im- 
aging session was initiated after collection of 
a 10X mosaic DAPI image and selection of 
the imaging area. For specimens that passed 
minimum count threshold, imaging was ini- 
tiated, and processing completed according to 
VIZGEN proprietary protocol. Following im- 
age processing and segmentation, cells with 
fewer than 50 transcripts are eliminated, as well 
as cells with volumes falling outside a range of 
100 to 300 um. 

The 140-gene Human cortical panel was se- 
lected using a combination of manual and algo- 
rithmic based strategies requiring a reference 
single cell/nucleus RNA-seq dataset from 
the same tissue, in this case the human MTG 
snRNAseq dataset and resulting taxonomy (J). 
First, an initial set of high-confidence marker 
genes are selected through a combination of 
literature search and analysis of the reference 
data. These genes are used as input for a greedy 
algorithm (detailed below). Second, the refer- 
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ence RNA-seq data set is filtered to only in- 
cude genes compatible with mFISH. Retained 
genes need to be (i) long enough to allow probe 
design (>960 base pairs); (ii) expressed highly 
enough to be detected (FPKM = 10), but not so 
high as to overcrowd the signal of other genes 
in acell (FPKM < 500); (iii) expressed with low 
expression in off-target cells (FPKM < 50 in 
nonneuronal cells); and (iv) differentially ex- 
pressed between cell types (top 500 remain- 
ing genes by marker score20). To more evenly 
sample each cell type, the reference dataset is 
also filtered to include a maximum of 50 cells 
per cluster. 

The main step of gene selection uses a greedy 
algorithm to iteratively add genes to the initial 
set. To do this, each cell in the filtered reference 
data set is mapped to a cell type by taking the 
Pearson correlation of its expression counts 
with each cluster median using the initial gene 
set of size n, and the cluster corresponding to 
the maximum value is defined as the “mapped 
cluster.” The “mapping distance” is then de- 
fined as the average cluster distance between 
the mapped cluster and the originally assigned 
cluster for each cell. In this case a weighted clus- 
ter distance, defined as one minus the Pearson 
correlation between cluster medians calculated 
across all filtered genes, is used to penalize cases 
where cells are mapped to very different types, 
but an unweighted distance, defined as the 
fraction of cells that do not map to their assigned 
cluster, could also be used. This mapping step 
is repeated for every possible 7 + 1 gene set in 
the filtered reference data set, and the set with 
minimum cluster distance is retained as the 
new gene set. 

These steps are repeated using the new get 
set (of size nm + 1) until a gene panel of the 
desired size is attained. Code for reproduc- 
ing this gene selection strategy is available as 
part of the mfishtools R library (https://github. 
com/AllenInstitute/mfishtools). 

H5ad creation: Any genes not matched across 
both the MERSCOPE gene panel and the map- 
ping taxonomy were filtered from the dataset 
before starting. From there, cluster means were 
calculated by dividing the number of cells per 
cluster by the number of clusters collected. Next, 
we created a training dataset by finding marker 
genes for each cluster by calculating the 12norm 
between all clusters and the mean counts of 
each gene per cluster. This training dataset 
was fed into a knn alongside the MERSCOPEs 
cell by gene panel to iteratively calculate best 
possible gene matches per cluster. All scripts 
and data used are available at: https://github. 
com/AllenInstitute/. 


Analysis 
Patch-seq data curation and mapping 


To evaluate the mapping quality of Patch-seq 
samples, we calculated the NMS score - a ratio 
of “on” and “off’ markers by subclass (14, 16, 40). 


13 of 16 


A value of 0.4: was designated as a cut off for low- 
and high-quality data. Only cells mapping to 
GABAergic interneuron types and that had a 
NMS score >0.4 were included in the analyses 
for this study. 

Reference transcriptomic data used in this 
study were obtained from dissociated nuclei 
collected from human MTG (J), and are pub- 
licly accessible at the Allen Brain Map data 
portal (https://portal.brain-map.org/atlases-and- 
data/rnaseq). This taxonomy consists of a hi- 
erarchical dendrogram of cell types, along with 
a set of marker genes defined to distinguish 
types at each split in the tree. The Patch-seq 
transcriptomes were mapped to the reference 
taxonomy following the tree-mapping method 
(map_dend_membership in the scrattch.hicat 
package) as described previously (36, 48). Brief- 
ly, at each branch point of the taxonomy we 
computed the correlation of the mapped cell’s 
gene expression with that of the reference cells 
on each branch, using the markers associated 
with that branch point (that is, the genes that 
best distinguished those groups in the reference), 
and chose the most correlated branch. The pro- 
cess was repeated until reaching the leaves of 
the taxonomy. To determine the confidence of 
Mapping, we applied 100 bootstrapped itera- 
tions at each branch point, and in each iteration 
70% of the reference cells and 70% of markers 
were randomly sampled for mapping. The per- 
centage of times a cell was mapped to a given 
transcriptomic type was defined as the map- 
ping probability, and the highest probability 
transcriptomic type was assigned as the mapped 
cell type. 


Electrophysiology feature analysis 


For all electrophysiology stimuli that elicited 
spiking, APs were detected by first 40 identify- 
ing locations where the smoothed derivative 
of the membrane potential (dV/dt) exceeded 
20 mV ms", then refining on the basis of several 
criteria including threshold-to-peak voltage, 
time differences and absolute peak height. 
For each AP, threshold, height, width (at half- 
height), fast after-hyperpolarization (AHP) and 
interspike trough were calculated (trough and 
AHP were measured relative to threshold), along 
with maximal upstroke and downstroke 45 rates 
dV/dt and the upstroke/downstroke ratio (that 
is, ratio of the peak upstroke to peak down- 
stroke). Following spike detection, summary 
features were calculated from sweeps with long 
square pulse current injection: input resistance 
(all hyperpolarizing sweeps, -10 to -90 pA), sag 
(hyperpolarizing sweep with response closest 
to -100 mV, generally -90 pA stimulus, and de- 
polarizing sag on subthreshold response closest 
to rheobase), rheobase, and f-I slope (all five 
spiking sweeps, up to rheobase +80 pA). Spike 
train properties were calculated for each spik- 
ing sweep: latency, average firing rate, initial 
instantaneous firing rate (inverse of first ISI), 
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mean and median ISI, ISI CV, irregularity ra- 
tio, and adaptation index. These spike train 
features and the single spike properties listed 
above (measured on the first AP) were sum- 
marized for both the rheobase sweep and a 
stimulus 40pA above rheobase. For spike up- 
stroke, 10 downstroke, width, threshold, and 
interspike interval (ISI), “adaptation ratio” fea- 
tures were calculated as a ratio of the spike 
features between the first and third spike (on 
the lowest amplitude stimulus to elicit at least 
four spikes). Spike shape properties were also 
calculated for short (3 ms) pulse stimulation 
and a slowly increasing current ramp stimulus 
(first spike only). A subset of cells also had sub- 
threshold 15 frequency response characterized 
by a logarithmic chirp stimulus (sine wave with 
exponentially increasing frequency), for which 
the impedance profile was calculated and char- 
acterized by features including the peak fre- 
quency and peak ratio. Feature extraction was 
implemented using the IPFX python package; 
custom code used for chirps and some high- 
level features will be released in a future ver- 
sion of IPFX. 


Morphology feature analysis 


Prior to morphological feature analysis, recon- 
structed neuronal morphologies were expanded 
in the dimension perpendicular to the cut sur- 
face to correct for shrinkage (77, 78) after tissue 
processing. The amount of shrinkage was 
calculated by comparing the distance of the 
soma to the cut surface during recording and 
after fixation and reconstruction. For mouse 
cells, a tilt angle correction was also performed 
based on the estimated difference (through 
CCF registration) between the slicing angle 
and the direct pia-white matter direction at the 
cell’s location (48). Features predominantly 
determined by differences in the z-dimension 
were not analyzed to minimize technical ar- 
tifacts due to z-compression of the slice after 
processing. 

Morphological features were calculated as 
previously described (48). In brief, feature def- 
initions were collected from prior studies 
(79, 80). Features were calculated using the 
skeleton keys python package (https://github. 
com/AllenInstitute/skeleton_keys). Features 
were extracted from neurons aligned in the di- 
rection perpendicular to pia and white matter. 
Laminar axon histograms (bin size of 5 mi- 
crons) and earth movers distance features 
require a layer-aligned version of the mor- 
phology where node depths are registered to 
an average laminar depth template. 


Statistical analysis of variability 


To assess the variability of morphological fea- 
tures within the acute and culture paradigm, 
we used a Mann-Whitney U test for each con- 
dition. Results were reported as the resulting 
U statistic and the P value. P values were cor- 
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rected for false discovery rate (FDR, Benjamini- 
Hochberg procedure). Analysis of feature rela- 
tionships across subclass were assessed using 
a one-way ANOVA on ranks (KW, Kruskal- 
Wallis test), correction for FDR and post-hoc 
Dunn’s tests were run across any pairwise com- 
parison, with an FDR correction when neces- 
sary. Analysis of feature relationships with other 
variables including days in culture and qual- 
itative morphology types were likewise assessed 
by Mann-Whitney tests for binary variables, KW 
test for categorical, and Pearson’s correlations 
for continuous variables, FDR-corrected across 
features when necessary. 

Unless otherwise specified, statistical analyses 
of morphological features were implemented 
in python using both the statsmodels and scipy 
packages. Statistical analyses of electrophysio- 
logical features were implemented in Prism or 
python using both the statsmodels and scipy 
packages. 


Morphological Clustering 


The co-clustering matrix for the SST and PVALB 
subclass morphology dataset was calculated 
by iterative random sampling. During each 
iteration, 95% of samples were randomly se- 
lected to create a shared nearest neighbors 
graph. We then applied the Fast-greedy com- 
munity detection algorithm using the Python 
package python-igraph for clustering assign- 
ment. For each pair of samples, the coclustering 
score was defined as the times of coclustering 
normalized by the iterations of co-occurring. 
Resampling was performed 500 times to reach 
saturation. Agglomerative clustering using ward 
linkage was performed on the coclustering matrix 
to get clusters. 


Calculating differentially expressed genes 


Differentially expressed (DE) genes were calcu- 
lated for several analyses using the “FindMarkers” 
function from Seurat V4 (https://satijalab.org/ 
seurat/) (81). This function preforms either a 
Wilcox test or a ¢ test to calculate a P value and 
Bonferroni correction to determine whether 
genes are DE in two groups. This was used to 
find DE genes in the following cases: (i) cul- 
ture microglial genes: cultured cells with and 
without microglia signature (test.use=”wilcox”, 
min.pct = 0.5, logfc.threshold = 4); (ii) generic 
microglial genes: common genes with and 
without microglia signature (wilcox, 0.25, 2) in 
acute and culture cells; (ili) genes increasing 
or decreasing in PVALB cells in culture: com- 
mon genes DE in acute versus culture cells 
(wilcox, 0.25, 2) with and without microglia 
signature; and (iv) morphology type genes: 
genes DE between cells with any pair of DBC, 
MC, and Sparse SST morphologies (¢ test, 0, 0). 
Genes DE between dissociated cells in SST and 
PVALB cell types were calculated using the 
RNA-seq Data Navigator tool (https://celltypes. 
brain-map.org/rmaseq/human/mtg), by defining 
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“Set 1 Selection” and “Set 2 Selection” as all PVABL 
and SST types (excluding SST FRZB) and then 
running “Find Marker Genes.” 


Assigning cells a microglial signature 


Microglial signature genes were initially de- 
fined in cells from culture by following the 
standard Seurat pipeline for clustering. First, 
the top variable genes were defined using 
FindVariableFeatures Seurat function with 
default parameters. The data was then scaled 
(using ScaleData), and the dimensionality was 
reduced by calculating the first 20 principal com- 
ponents (RunPCA), and then generating a 2D 
UMAP. Finally, cells were clustered into two 
clusters by running FindNeighbors on the UMAP 
space and FindClusters with resolution = 0.001. 
These clusters were used as initial cell sets to 
define 269 culture microglial genes, as described 
above. Final microglial calls were defined by re- 
running this clustering process independently 
on cells from culture and cells from acute dis- 
sections but using the 269 culture microglial 
genes as manually input variable genes and 
setting clustering resolution to 0.1. 


Visualization of integrated transcriptomic space 


To account for the non-negligible gene ex- 
pression signatures of culture and microglia, 
we divided the Patch-seq dataset into four 
groups (acute, acute - microglial, culture, culture - 
microglial) and integrated these data together 
with dissociated reference nuclei by following 
the Seurat tutorial for integration (87). To do 
this we applied the “FindIntegrationAnchor” 
function setting the union of all tree-mapping 
marker genes as the set of anchor features, 
with dissociated cells as the reference data set. 
We then applied IntegrateData using default 
parameters to put all cells in the same tran- 
scriptomic space. We then scaled the data, 
reduced the dimensionality using principal 
component analysis (PCA; 30 PCs), and vi- 
sualized the results with 2D UMAPs. Finally, 
metadata such as cluster, paradigm, microg- 
lial status, and morphology type were then 
overlaid onto this UMAP space using different 
colored or shaped points for all or a subset of 
cells. Except as noted below, all transcriptom- 
ics UMAPs are presented in this UMAP space. 

The UMAP in figure SIC was generated using 
a union of the top 50 generic microglial genes, 
50 acute cell markers, and 50 culture cell mark- 
ers as variable genes, and following the above 
process without using data integration. Likewise, 
the UMAP in Fig. 4D follows the same proce- 
dure only on SST CALB1 neurons, setting the 
253 genes DE between 10 DBC, MC, and Sparse 
SST neurons as variable genes. 
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INTRODUCTION: The neocortex contributes to 
higher-order cognitive processes in part by 
generating predictions about the external world 
and comparing them with sensory informa- 
tion. This process is mediated by the complex 
architecture and circuitry of the neocortex, 
which in most cortical areas consists of six 
layers. The outermost layer, layer 1 (L1), is 
thought to be where internally generated pre- 
dictions are compared with sensory signals via 
a confluence of long-range feedback axons and 


A Neocortical L1 circuit 


B Alignment 


local-neuron dendrites. Local inhibitory inter- 
neurons complete the L1 architecture and play 
a strong modulating role in these feedback 
connections. 


RATIONALE: Studies in rodents have shown 
that different types of Ll interneurons per- 
form different functions. Distinct cell types have 
also been noted in human L1, but their function 
and gene expression have not been fully char- 
acterized. Single-cell genomics is revolutioniz- 
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Innovations in human layer 1 interneurons. (A) L1 consists of long-range axons (blue), pyramidal 
neuron dendrites, and diverse GABAergic interneurons. (B) Aligning transcriptomic cell-type taxonomies 
between human and mouse defined putative homologous L1 interneuron subclasses. Patch-seq experiments 
in brain slices from neurosurgical patients annotated transcriptomic cell types with morphoelectric 
properties. Cross-species comparison revealed clear matches, divergence of specific subclasses, and 


overall shifts in properties. 
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ing the study of cellular diversity in the b 
through the construction of transcriptomic— 
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type taxonomies. As described in companion 
papers, these methods can capture the exception- 
al diversity of the brain and allow comparative 
analysis across species, providing a framework 
to bound the problem of cellular diversity. 
However, this framework remains disconnected 
from cellular function without simultaneous 
measurement of physiology, morphology, and 
gene expression (known as “Patch-seq”), anno- 
tating these cell types with their functional 
properties. We applied Patch-seq with brain 
tissue from human neurosurgical patients to 
reveal cell-type diversity in human neocorti- 
cal L1 and to make principled cross-species 
comparisons. 


RESULTS: We leveraged single-cell transcrip- 
tomics to define neuronal cell types in human 
and mouse LI and to quantitatively identify 
homologous subclasses across species. Although 
established marker genes for L1 cell types were 
not well conserved from mouse to human, 
whole-transcriptome analysis enabled cross- 
species alignment. We observed differences 
in the abundance of LI cell types, including 
cell types found in human but not mouse L1, 
and stronger differences in gene expression be- 
tween human types than between mouse types. 

Analysis of Patch-seq data showed that L1 
cell types in both human and mouse were dif- 
ferentiated by morphological and electrophy- 
siological features related to basic neuronal 
function. However, the features that distin- 
guish cell types were different in mouse versus 
human LI. We highlighted and characterized 
two cell types in human L1 with distinctive 
phenotypes unmatched by their putative mouse 
homologues: the “rosehip” cell type with com- 
pact axonal structure and large rosehip-like pre- 
synaptic boutons and a larger PAX6-expressing 
type that fired high-frequency action potential 
bursts at the onset of stimulation. 


CONCLUSION: Our analysis provides tools for 
understanding human L1 diversity and sug- 
gests hypotheses and paths for further inves- 
tigation. The observed cross-species differences 
suggest differential regulation of higher-order 
input to the neocortical circuit in human and 
mouse. We propose that the dramatic expansion 
in primates of deeper layers of neocortex and the 
increased cellular diversity that accompanied it 
necessitated new types of inhibition in LI. 
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Neocortical layer 1 (L1) is a site of convergence between pyramidal-neuron dendrites and feedback 
axons where local inhibitory signaling can profoundly shape cortical processing. Evolutionary expansion 
of human neocortex is marked by distinctive pyramidal neurons with extensive L1 branching, but 
whether L1 interneurons are similarly diverse is underexplored. Using Patch-seq recordings from human 
neurosurgical tissue, we identified four transcriptomic subclasses with mouse L1 homologs, along 

with distinct subtypes and types unmatched in mouse L1. Subclass and subtype comparisons showed 
stronger transcriptomic differences in human L1 and were correlated with strong morphoelectric 
variability along dimensions distinct from mouse L1 variability. Accompanied by greater layer thickness 
and other cytoarchitecture changes, these findings suggest that L1 has diverged in evolution, 
reflecting the demands of regulating the expanded human neocortical circuit. 


eocortical layer 1 (L1) is implicated in 
several higher-order brain functions, in- 
cluding state modulation (7), learning 
(2-5), sensory perception (6), and con- 
sciousness (7). The neural circuitry that 
mediates these functions consists of converging 
pyramidal-cell dendrites; long-range axons ori- 
ginating from thalamic, cortical, and neuro- 
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modulatory regions; and axons from local 
y-aminobutyric acid-producing (GABAergic) 
interneurons (8). Although some of this inhib- 
itory input arises from other layers (Martinotti 
cells, for example), much of it arises from neu- 
rons with cell bodies in LI, an entirely GABAergic 
cell population with distinct developmental ori- 
gins (9, 10). Emerging evidence suggests that 
these L1 interneurons profoundly shape corti- 
cal processing and that diversity within this 
population is linked to diversity of function 
(11, 12). As such, the L1 interneuron repertoire 
is a potential site of evolutionary divergence 
that could contribute to specialized cortical 
function in humans and other primates. In ro- 
dents, a progression of classification schemes 
for L1 neurons (13-18) has evolved toward a 
view of four canonical types based on molec- 
ular markers (19), but the robustness of this 
scheme, both across modalities and across spe- 
cles, remains unclear (particularly in human 
and nonhuman primates). Indeed, the obser- 
vation of a “rosehip” cell type found in human 
and not mouse neocortex (20) highlights the 
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importance of studying human LI to identify 
potential species specializations and to relate 
mouse literature to human LI cell types and 
function. 

Traditionally, L1 cell types have been de- 
fined by their morphology, sublaminar location, 
intrinsic membrane properties, and a handful 
of marker genes. However, applying distinc- 
tive features from rodent to define and study 
human cell types can be tenuous. Instead, single- 
cell whole-transcriptome data can be leveraged 
to define cross-species cell-type homologies 
(21, 22) and reveal genetic and phenotypic di- 
versity obscured by the marker gene approach 
(23, 24), as observed in vivo in mouse L1 (7). 
The Patch-seq technique (25, 26), which com- 
bines patch-clamp electrophysiology, RNA se- 
quencing, and morphological reconstruction 
from the same neuron, gives us unprecedented 
ability to reveal cell-type diversity in human LI. 
We leveraged this multimodal data to provide 
a comprehensive view of cell-type distinctions 
previously proposed from a subset of modali- 
ties, make principled cross-species comparisons, 
and robustly identify distinct phenotypes found 
in human LI across modalities. 


Results 
L1 Patch-seq pipeline and 
transcriptomic references 


To guide our analysis of L1 cell types, we used 
transcriptomic types (t-types) previously de- 
fined in reference datasets from human middle 
temporal gyrus (MTG) and mouse primary vi- 
sual cortex (VISp), single-nucleus RNA sequenc- 
ing (snRNA-seq) in human, and single-cell 
RNA-seq (SCRNA-seq) in mouse (2/, 27). With 
annotations from layer dissections as a guide, 
we identified 10 L1 t-types in human and 8 in 
mouse (Materials and methods and fig. SIA). 
In uniform manifold approximation and pro- 
jection (UMAP) (28) views of transcriptomic 
data (Fig. 1A), many human t-types formed 
separated clusters, with others clustered in 
groups of related t-types, whereas mouse L1 
t-types showed more continuous variability. 
This contrast suggests stronger transcriptomic 
specialization in human LI, similar to that 
found in supragranular excitatory neurons 
(29), and indicates that more robust groupings 
of L1 types into highly distinct transcriptomic 
subclasses can be delineated in human. We 
grouped related human t-types into Ll-focused 
transcriptomic subclasses by quantifying the 
pairwise distinctness of t-types in terms of a d’ 
separation of likelihoods (23, 24). This group- 
ing formed three subclasses, with three t-types 
remaining ungrouped (Fig. 1B). Expression of 
the inhibitory subclass markers PAX6 and 
LAMP5 (27, 30) and the t-type marker MC4R 
also closely matched these subclass bounda- 
ries (Fig. 1C). 

Human subclass marker genes did not clearly 
identify subclasses in mouse, posing a challenge 
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Fig. 1. snRNA-seq demonstrates L1 diversity and provides a reference for 
Patch-seq transcriptomic mapping. (A) UMAP projections of (left) human and 
(right) mouse gene expression for L1 t-types (single-neuron or -nucleus RNA-seq). 
(B) Human t-types grouped by thresholding transcriptomic distinctness d’, 
defining subclasses. Remaining ungrouped t-types are marked as “L1 VIP” or 
“other” according to cross-species homology results. (©) Expression of canonical 
and t-type-specific marker genes across L1 t-types in (left) human and (right) 
mouse. Pink background indicates human subclass markers; gray background 
indicates classical mouse markers. Vertical lines group t-types by subclass. Violin 
plots show normalized probability density of gene expression (shape width) and 
median expression (dots), with expression in log(CPM+1) normalized by gene 


mouse types 


for each species and maximal expression in counts per million (CPM) noted at 
right. (D) Mouse t-types grouped with human t-types into homologous 
subclasses (outlined) by thresholding similarity scores (heatmap intensity, 
from cluster overlap in integrated transcriptomic space). Non-L1 t-types are 
excluded, with maximal similarity over all non-L1 types shown for reference. 
(E) Proportions of subclasses and unclassified t-types in L1 Patch-seq data, by 
species. “Other L1 t-types” refers to t-types in human L1 with no mouse homolog 
in Ll. “Deeper t-types” refers to types found in L1 in lower proportions, not 
meeting the criteria for core LI t-types. All cross-species proportion differences 
(except L1 VIP) significant at FDR-corrected P < 0.001, one-versus-rest 
Fisher's exact tests. 


for cross-species comparison. Marker genes 
were either not expressed in any mouse LI type 
(MC4R) or were expressed broadly and over- 
lapped with other markers (LAMP5) (Fig. 1C, 
top). Similarly, markers previously suggested 
for L1 subclasses in mouse (19) showed graded 
or complete lack of expression in human LI 
(Fig. 1C, bottom). Notably, the marker /d2, 
suggested for distinguishing a class of interneu- 
rons including all L1 types (Pvalb /Sst /Vip ) 
(31), was consistently expressed across most LI 
types in both species (with the exception of mar- 
ginal expression in ungrouped human t-types). 

Given the lack of conserved subclass mark- 
ers across species, we instead grouped mouse 
t-types for cross-species analysis by using cluster 
distances in an integrated transcriptomic space 
(21) (fig. SIC), identifying each mouse t-type 
with the most similar human subclass or un- 
grouped t-type (Fig. 1D and fig. S1B). These 
matches resulted in four homology-driven 
subclasses (called subclasses hereafter) with 
proportions largely comparable across spe- 
cies (Fig. 1E; PAX6 is the notable exception), 
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named by the subclass marker genes in human. 
Two additional human LI t-types (SST BAGE2 
and VIP PCDH20) were excluded from cross- 
species L1 subclasses because of their homolo- 
gy to t-types found in deeper layers but not in 
L1 of mouse neocortex. This observation sug- 
gests that there is a shift in some of the inter- 
neuron diversity across laminar boundaries 
between mouse and human (32). Reinforcing 
the validity of these subclass divisions in mouse, 
we noted likely matches to previous mouse L1 
subclasses (19) based on marker-gene expres- 
sion (Fig. 1C and table S1): neurogliaform 
(NGFC) cells (Npy"/Ndnf*) to LAMP5, canopy 
cells (Npy /Ndnf’) to MC4R, and a7 cells 
(Ndnf /Vip /Chrna7*) to PAX6 (11). However, 
uncertainty in these matches highlights the 
need for further confirmation based on morpho- 
electric properties. 

To characterize morphoelectric and transcrip- 
tomic diversity across human L1 cell types, we 
used a previously established pipeline for high- 
throughput data acquisition and analysis (26, 29) 
to generate and release a comprehensive LI 
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Patch-seq dataset. Human tissue was obtained | 
from surgical samples and processed with stan- , 
dardized protocols; most samples originated 
from the MTG, along with smaller fractions 
from other temporal and frontal areas (data S1). 
All cells were filtered for transcriptomic (n = 
250) and electrophysiological quality (n = 194), 
and a subset of neurons (7 = 71) with sufficient 
cell labeling was imaged at high resolution 
and their dendritic and axonal morphologies 
reconstructed. 

We assigned transcriptomic cell types and 
subclasses to Patch-seq samples by using a 
“tree mapping” classifier, a decision tree based 
on the transcriptomic taxonomy structure 
(Materials and methods) (24). Validating these 
assignments, we visualized t-type labels from 
Patch-seq and reference datasets in a joint 
UMAP projection by using alignment methods 
from the Seurat package (33) and found strong 
correspondence (fig. SIE). Additionally, marker 
genes used by the classifier showed strong 
correlation by t-type between Patch-seq data 
and the snRNA-seq reference (Fig. S1D). 
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Because Patch-seq sampling was not uni- 
form across cortical layers, we also measured 
the laminar distribution of L1 t-types with a 
spatially resolved, robust, and reliable single-cell 
profiling technique [multiplexed error-robust 
fluorescence in situ hybridization (MERFISH) 
(32)] (Materials and methods). MERFISH lami- 
nar distributions were compatible with those 
from layer dissections of snRNA-seq, confirm- 
ing that human LI t-types are predominantly 
found in LI or on the L1/L2 border and dem- 
onstrating t-type-specific distributions across 
deeper layers and within L1 for certain types 
(fig. S2, B to D). Proportions of t-types within LI 
were also generally matched between Patch-seq, 
snRNA-seq, and MERFISH (fig. S2A), with one 
exception: Patch-seq had fewer SST BAGE2 
cells and more PAX6 CDH12 and VIP TSPAN12 
cells compared with snRNA-seq [P < 0.05, false 
discovery rate (FDR)-corrected Fisher’s exact 
test]. MERFISH had intermediate proportions 
of SST BAGE2, with no significant differences 
compared with Patch-seq (P > 0.18), perhaps 
indicating differences caused by technical fac- 
tors in snRNA-seq only, such as imprecise layer 
dissections. 


Morphoelectric diversity in human L1 


Organizing the Patch-seq dataset by transcrip- 
tomic subclass revealed the exceptionally diverse 
morphology and physiology of human LI inter- 
neurons. Morphologically, subclasses were dis- 
tinguished by vertical orientation of axons and 
dendrites, axon extent and shape, and den- 
drite branching (Fig. 2, A and D; fig. S4; and 
data S2). Electrophysiologically, subclasses were 
distinguished by subthreshold properties such 
as the ratio of the steady-state to the transient 
peak of hyperpolarization (sag), as well as sev- 
eral suprathreshold properties, including firing 
rate, single-action-potential kinetics, and adap- 
tation of spike kinetics during trains of action 
potentials (Fig. 2, B and C, and data S2). Spike 
adaptation properties showed a strong inverse 
relationship with sag across the dataset (fig. 
S3A). Sag is often mediated by hyperpolarization- 
activated, cation-nonselective (HCN) channels 
(34) and spike broadening by specific K* chan- 
nels (35, 36), so this finding may indicate a func- 
tional relationship between these channels in 
all subclasses of human LI neurons. 

LAMP5 cells, the largest subclass, corre- 
sponded to the classical NGFC cell type (37), 
with highly branched, descending dendrites 
and horizontally elongated axons, with either 
a rectangular or triangular shape (Fig. 2A). Their 
electrophysiological phenotype was relatively 
undistinguished, with firing-rate adaptation 
and sag present but small (Fig. 2B). PAX6 cells 
had axons similar to those of LAMP5 cells, 
occasionally with descending branches, and 
sparse downward dendrites, along with mini- 
mal sag and high initial firing rate at the onset 
of response to depolarizing current injection 
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(Fig. 2, A and B). MC4R cells had extremely 
compact, ball-like axonal arbors, along with 
strong sag; on this basis, they were tentatively 
identified as a match to the recently discovered 
rosehip type (further characterized in the dis- 
cussion of distinctive neuronal phenotypes 
below) (20). L1 VIP (TSPAN12 t-type) cells had 
descending axon collaterals (13) with a con- 
sistent stellate-like dendrite morphology and 
high sag. The two cell types with no matching 
t-types within mouse L1, SST BAGE2, and VIP 
PCDH20 showed extremely diverse dendritic 
and axonal structure, often with substantial 
horizontal or descending axon branches—even 
avoiding L1 entirely in the case of some BAGE2 
cells. These t-types appeared more uniform 
electrophysiologically, with relatively small 
spikes and high adaptation and sag, but were 
sparsely sampled and thus difficult to fully 
characterize. 

In a few instances, we also observed differ- 
entiation between t-types within the same sub- 
class. Within the LAMP5 subclass, sag and 
initial firing rate decreased from the NMBR 
t-type to DBP to LCP2 t-types (Fig. 2C, strip 
plots within boxplot; Spearman correlation 
FDR-corrected, P < 107°). Whereas other LAMP5 
t-types were mostly restricted to L1 and super- 
ficial L2, the LCP2 t-type was found distributed 
across all cortical layers, with axonal arbors 
becoming less elongated and overlapping less 
with dendritic arbors for deeper cells (fig. S2, B 
and D, and fig. S3B). PAX6 cells were dis- 
tinguished by whether the initial high-frequency 
firing formed a discrete burst (TNFAIP8L3) or 
continuously adapted (CDH12), and the two 
MCA4R t-types were distinguished by the mag- 
nitude of sag and irregularity of firing (Fig. 2B). 

Given the potential for the observed neuro- 
nal diversity to be determined in part by di- 
versity in tissue-donor characteristics, we 
tested all morphoelectric features for effects 
of donor medical condition, sex, and age (fig. 
S6, A and B, and data S3). Most effects were 
small and in features not linked to LI diver- 
sity, with the exception of higher dendritic 
branching in cells from tumor patients com- 
pared with that found in epilepsy patients 
(fig. S6A); this result was not explained by brain 
area or subclass. 


Cross-species differences in L1 


Evolutionary expansion of L2 and 3 (L2/3) in 
primates was previously linked to changes in 
cytoarchitecture—including thinning out of cell 
density and increased soma size—accompanying 
the specialization of pyramidal cell types (29). 
Therefore, we first quantified cytoarchitecture 
differences in L1 of human tissue samples 
[neuronal nuclei (NeuN)-stained slices from 
Patch-seq tissue blocks] compared with L1 in 
mouse samples (Fig. 3A). In contrast to cross- 
species differences in L2/3, human LI was 
thicker but not less dense, with smaller cell 
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bodies as compared with several mouse neo- 
cortical areas (fig. S6D). 

To study cross-species differences in the mor- 
phoelectric properties of L1 neurons, we com- 
piled a comparison Patch-seq dataset from 
mouse LI neurons (n = 272, 255 with elec- 
trophysiology and 43 with morphology) con- 
sisting of previously published data from a 
cross-layer analysis of interneurons in VISp 
(24) and additional recordings in L1 and L2/3 
of visual cortex and the temporal association 
area (TEa). [TEa was held out as a validation 
set because this region is often compared with 
human MTG (38, 39).] Despite the differences 
in LI cytoarchitecture, morphologies of L1 neu- 
rons generally showed remarkable similarity 
across species when comparing across matched 
homology-driven subclasses, with two excep- 
tions (Fig. 3C, figs. S4 and S5, and data S4). 
No mouse L1 neurons had morphologies re- 
sembling the unmatched human LI t-types 
(VIP PCDH20 and SST BAGE12, homologous 
to deeper mouse t-types), and the MC4R sub- 
class was less morphologically distinct, provid- 
ing evidence for type-specific divergence across 
species (fig. S5). Although human neurons 
across subclasses were slightly larger in hori- 
zontal extent, no differences were observed 
in vertical dimensions or dendritic diameter 
(fig. S3C). Mouse VIP cells had descending 
axon branches as in human VIP cells, but with 
greater variability of structure. Mouse LAMP5 
cells had dense NGFC-like axonal arbors, con- 
firming their match through Ndnf/Npy ex- 
pression. Unlike human axons, mouse axons 
rarely extended above dendrites (Fig. 3D, left), 
perhaps reflecting sublaminar structure found 
only in the thicker human LI. Human neurites 
were also structured differently, with smaller 
contraction ratios (higher tortuosity) compared 
with the straighter but more heavily branched 
mouse dendrites (Fig. 3D and data S4). 

As in previous studies, electrophysiological 
properties showed stronger differences across 
species (40). Mouse cells had less sag, lower 
amplitude spikes, and higher rheobase across 
types and within at least two of four matched 
subclasses (Fig. 3D and data S4). We replicated 
these findings in a comparison between MTG 
and a smaller TEa dataset, verifying that cross- 
species differences were not due to regional 
differences between MTG and VISp (Fig. 3E). 
Proportions of L1 t-types also varied little across 
brain regions in mouse and human single- 
neuron and -cell RNA-seq reference datasets 
(fig. S2B) (47). We also explored dependence of 
LI interneuron morphoelectric properties on 
brain region within our human data and found 
moderate effects on a set of features including 
dendrite extent and input resistance (fig. S6C). 
For these features, more sparsely sampled 
regions either failed to differ from MTG cells 
or had smaller dendritic extent and larger input 
resistance, diverging more strongly from mouse 
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Fig. 2. Human L1 transcriptomic subclasses are morphoelectrically diverse. 
(A) Example human morphologies for L1 t-types are displayed by subclass. 
Neurons are shown aligned to an average cortical template, with histograms to 
the right of the morphologies displaying average dendrite (darker color) and 
axon (lighter color) branch length by cortical depth for all reconstructed cells in 
L1 and L2 (shading shows +1 SD about mean; soma locations are represented by 
black circles). (B) Electrophysiology summary view by t-type and subclass. (Top) 
Images show example spiking responses. Scale bar, 0.5 s, 20 mV. (Bottom) 
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Cell-by-cell summary traces are shown, with t-type average (curved black lines), 
dataset average (curved dashed lines), and individual cells (curved colored 
lines). (Top to bottom) Diagrams show the phase-plane (dV/dt versus V) plot of 
first action potential. The instantaneous firing rate (IFR) and the hyperpolarizing 
response (Hyperpol.) are normalized to peak. Spiking plots (example, phase- 
plane, IFR) are at 40 pA above rheobase; hyperpolarizing response is at 
membrane potential closest to -100 mV. Scale bar, 0.5 s, 20 mV. Counts are 
found in table S2. AP, action potential. (© and D) Electrophysiological 
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and morphological features distinguishing Ll subclasses [Kruskal-Wallis (KW) 
test, FDR-corrected P < 10°’ for electrophysiology, <10~° for morphology: 
data S2]. Boxplots show subclass statistics [box marks quartiles; whiskers 
extend 1.5 x interquartile range (IQR) past box], with individual cells 
arranged horizontally by t-type. Significant pairwise comparisons marked by 


features, which suggests again that the cross- 
species differences were not inflated by regional 
sampling. 

We next investigated causal factors underly- 
ing cross-species electrophysiology differences. 
Simulations of passive biophysical models based 
on reconstructed morphologies showed that 
input-resistance variability in human but not 
mouse LI cells can be explained by morphol- 
ogy, suggesting that the lower input resistance 
in mouse may be partly due to active ionic 
conductances (fig. S3G). The small cross-species 
effect of morphology on input resistance in the 
models could be explained by differences in 
dendritic branching, especially near the soma, 
which would affect the effective membrane 
area for leak conductance. Indeed, we found a 
higher peak of total dendrite cross-sectional 
area at ~50 um from the soma—as well as 
slightly higher total volume in mouse cells 
(fig. S3F)—supporting this explanation. We also 
looked for correlated differences in ion-channel 
gene expression and morphological features 
compared with membrane properties in the 
Patch-seq dataset. Differences in spike shape 
and threshold could be explained by potassium- 
channel differences, along with related features 
such as rheobase and delayed spiking. Indeed, 
the expression of genes (KCND2, KCND3, and 
KCN8H7) associated with fast-inactivating 
(A-type) voltage-gated K* (Kv) channels [Kv4.2, 
Kv4.3, and the ERG3 channel Kv11.3 (42, 43)] 
was higher in mouse neurons and was cor- 
related with several action-potential features 
(fig. S3E). To test for corresponding differences 
in K* channel conductance, we measured macro- 
scopic currents in nucleated patches following 
whole-cell recording in a subset of cells. Com- 
pared with human neurons, mouse neurons 
showed much higher A-type K* conductance but 
comparable slow-inactivating (D-type) conduct- 
ance (Fig. 3F). Considering that blocking Kv4: 
channels in mouse NGFC cells decreases action- 
potential threshold and latency of first action- 
potential onset (44), these differences in A- 
type K* channel conductance, along with lack 
of Kv1.1 expression, may contribute to the lack 
of late spiking observed in human Ll NGFC 
cells as well (45). 

Lastly, we asked whether the strong mor- 
phoelectric variability observed between human 
L1 subclasses is also present in Ll of mouse 
neocortex. Ranking electrophysiological and 
morphological features by the amount of va- 
riability between subclasses they explain, we 
found that the two species had a similar amount 
of variability (number of significantly different 
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dendrites. 


features and their effect size) but varied along 
different sets of features (Fig. 3G). The most 
distinct features in human, such as sag and 
spike-shape adaptation, showed little varia- 
bility in mouse, and unlike in human, mouse 
subclasses varied physiologically in interspike 
interval (ISI) adaptation and spike after- 
hyperpolarization (AHP) properties and mor- 
phologically in relative vertical positioning of 
the axonal arbor (most features are largely 
driven by L1 VIP subclass; figs. S3 and S5). 


Distinctive neuronal phenotypes in human L1 


Despite the quantitative similarity in L1 hete- 
rogeneity across species, we noted two particu- 
larly distinctive phenotypes found in human 
LI only. The MC4R rosehip cells and the burst- 
ing PAX6 TNFAIP8L3 t-type were both quali- 
tatively distinct from other human L1 types 
and did not appear similar to any mouse LI 
types. To further highlight this contrast, we 
investigated each of these highly distinctive types 
in turn by quantifying the distinctive morpho- 
electric features and marker genes, then search- 
ing for comparable cells in the mouse LI dataset. 


Rosehip cells 


The MC4R subclass, putative rosehip cells, com- 
prises two transcriptomically similar t-types, SST 
CHRNA4 and ADARB2 MC4R, both highly 
distinct from other LI types, including the 
LAMP5 LCP2 t-type originally identified with 
the rosehip phenotype (20). MC4R morpholo- 
gies were all confirmed to qualitatively match 
the distinctive rosehip axonal structure and 
boutons (Fig. 4A and fig. S4) and were quan- 
titatively distinct from other L1 types in terms 
of maximum axonal path distance and branch 
frequency (Fig. 4B). We also noted two exam- 
ples of MC4R cells (both within the ADARB2 
MCAR t-type) with elaborate descending axons 
reaching the lower half of L3 and confirmed 
that the characteristic large, dense axonal 
boutons were visible on both the central axonal 
arbor and descending axons when present (Fig. 
4A, right). Electrophysiologically, both t-types 
that comprise the MC4R subclass showed 
strong sag, but only the ADARB2 MCA4R t-type 
showed the distinctive irregular firing (45) and 
stronger and faster sag (Fig. 2B and Fig. 3, A 
and B). Cells in the ADARB2 MCAR t-type also 
had somas and axons localized near the L1/L2 
border (fig. S2, C and D). We explored the ex- 
pression of genes related to neuron physiology 
[ion channels and G protein-coupled receptors 
(GPCRs) ] and found that markers distinguishing 
the entire rosehip subclass from the rest of LI, 
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lines above (FDR-corrected P < 0.05, Dunn's post hoc test). Illustrative 
electrophysiology traces (scale bar, 0.5 s, 20 mV) or layer-aligned morphologies 
are shown for high and low values of each feature. (Inset) Image shows that 
Sparsity of dendrites in human PAX6 cells is not due to inability to resolve 


including H7RIF [5-hydroxytryptamine (5-HT; 
serotonin) receptor 1F], along with markers 
distinguishing the rosehip subtypes—GRM5 
(metabotropic glutamate receptor 5) and RELN 
(reelin)—showed lower expression in ADARB2 
MC4R neurons (Fig. 4C). Together, these differ- 
ences in gene expression and physiology indi- 
cate that there are distinct rosehip neuron 
subtypes within human Ll. 

In mouse LI, we identified cell types corres- 
ponding to the MC4R subclass on the basis of 
similarity in transcriptomes, but this match 
was weak compared with the matches to other 
homologous types (Fig. 1D). Similarly, there 
were no mouse cell types observed with the 
morphological signatures consistent with hu- 
man rosehip cells (fig. S5) and only partial 
matches to the electrophysiological signatures: 
The homologous mouse MC4R subclass had 
moderate sag but no irregular spiking (Fig. 4D). 
Irregular spiking resembling the human ADARB2 
MC4R rosehip t-type was present only in a 
subset of LAMP5 cells (primarily Lamp5 Ntn1 
Npy2r) that did not have other rosehip-like 
features (Fig. 4D). Although not directly match- 
ing the rosehip phenotype, the mouse homology- 
driven MC4R subclass carried similarities with 
the canopy cell (19). Matched characteristics 
included the moderate sag, along with gene 
expression (Fig. 1C, Ndnf"/Npy ) and wide 
dendritic extent, but mouse MC4R cells did 
not have the canopy’s namesake Lla-dominant 
axon (fig. S7, B and C). Thus, the mouse MC4R 
subclass may be a distinct NGFC-like cell pop- 
ulation but certainly lacks distinct boundaries 
that can be clearly resolved either by cross- 
species comparison or reference to previous 
mouse L1 classifications. 


Bursting PAX6 TNFAIP8L3 cells 


The other highly distinctive firing pattern in 
human L1 was in the PAX6 TNFAIP8L3 t-type, 
which fired in high-frequency bursts at the 
onset of stimulation, followed by quiescence 
or regular firing at higher stimulus amplitudes. 
Spiking and dendritic structure were qualita- 
tively distinct between this t-type and the 
neighboring PAX6 CDH12 t-type, despite some 
similarity of axonal structure and subthreshold 
electrophysiology (Fig. 5A). Both the initial firing 
rate at rheobase and the after-depolarization 
potential (ADP) that followed the final spike 
quantitatively distinguished PAX6 TNFAIP8L3 
cells from all other L1 cells (Fig. 5B), as did the 
number of dendritic branches and large hori- 
zontal dendritic extent, more than 550 um 
wide. 
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Fig. 3. Comparison of human and mouse L1. (A) Examples of NeuN labeling of 
neurons in human MTG and mouse VISp. (B) Comparisons of mouse versus 
human L1 thickness, neuron density, soma area, and neuron count in 1-mm wide 
regions of interest (ROIs) of Ll. Metrics are plotted per ROI for L1 thickness, 
density, and neuron count, and per cell for soma area. Boxplots show quartiles, 
and asterisks indicate post hoc Dunn's test results (*P < 0.5; **P < 0.01: 
***P < 0.001) (calculated for MTG versus TEa only). Counts are found in 
Materials and methods. (C) Example layer-aligned morphologies from mouse 
and human L1 subclasses. One example is shown from each t-type; the scale 
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bar applies to both species. (D) (Left half) Morphological and (right half) 
electrophysiological features with differences between human and mouse L1 
cells (species effect from two-way ANOVA on ranks, FDR-corrected P < 10°” for 
morphology, P < 10°? for electrophysiology). For features with a species- 
subclass interaction (P < 0.05, FDR-corrected for electrophysiology, not 
significant for morphology), asterisks indicate post hoc Dunn's test results 

(*P < 0.5; **P < 0.01; ***P < 0.001). Counts are found in table S2. (Bottom) 
Representative examples from LAMP®5 subclass are shown below each plot 
(Left to right: layer-aligned reconstructions; action potential frequency as a 
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function of current injection; response to hyperpolarizing current nearest 

-100 mV, scale bar 0.5 s, 10 mV; first action potential at rheobase, scale bar 
1 ms, 20 mV). (E) Electrophysiological feature differences between human 

L1 and mouse VISp L1 (left) were validated by testing against mouse TEa (right). 
Features are selected by largest effect size against TEa [Mann-Whitney 

(MW) correlation coefficient (r), rank-biserial correlation]. Asterisks indicate 
significance (FDR-corrected MW test, *P < 0.5; **P < 0.01; ***P < 0.001). 

(F) Nucleated patch recordings quantifying A-type K” conductance of L1 neurons. 
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Fig. 4. MC4R rosehip cells. (A) Characterization of MC4R subtypes as rosehip 
cells. (Left) UMAP projection of transcriptomic data from MC4R and nearby 
subclasses. (Right) Example cells from each subtype. Morphologies show 
characteristic axonal arbors and boutons (Insets) 63x maximum intensity 
projection (MIP) images, scale bars, 10 um; compare to (D). Electrophysiology 
traces show sag response (hyperpolarization near -100 mV, rheobase, and 
rheobase +40 pA if present; scale bar, 0.5 s, 10 mV). (B) Electrophysiological and 
morphological features distinguishing MCA4R t-types (FDR-corrected MW test 
versus rest of L1, P < 10“). Boxplots show statistics of MC4R subtypes and other 
subclasses (box marks quartiles; whiskers extend 1.5xlQR past box). Significant 
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Example traces show voltage commands (black) and recorded currents (orange) 
from measurement protocol (top), along with example soma size measurement. 
(Bottom) Boxplots show fast conductance density in both species, with example 
traces shown for each group (scale bars, 200 ms, 400 pA). Only LAMP5 neurons 
were sampled in mouse. (G) Features distinguishing Ll subclasses in human and 
mouse organized by relevance to each species. Bars show size of subclass effect 
(e* from KW test), with features ranked by the difference between human 

and mouse effects. Unfilled bars indicate P > 0.05 (FDR-corrected). 
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pairwise comparisons (to MCA4R t-types only) marked by lines above (FDR- 
corrected P < 0.05, Dunn's test post hoc to KW test). (C) Gene expression of 
MCAR subclass (highlighted) and other L1 t-types, for between- and within- 
subclass marker genes (SnRNA-seq). Violin plots show expression in log(CPM+1), 
normalized by gene (maximal expression noted at right). (D) Characterization of 
mouse L1 cells with moderate sag or irregular firing, the human rosehip type’s 
distinct properties [boxplots as shown in (B), all pairwise comparisons tested]. 
Example morphology and electrophysiology are shown for mouse Lamp5d Ntnl 
Npy2r cell with highly irregular firing but lack of rosehip-like morphology 
[electrophysiology traces as shown in (A)]. (Inset) Axonal boutons, scale bar, 10 um. 


In mouse L1, the homologous PAX6 subclass 
was extremely rare, comprising only a few cells 
in the Lamp5d Krt73 t-type. These cells tended to 
fire in doublets at stimulus onset rather than 
in a full burst, sometimes followed by a delayed 
ADP (Fig. 5C). Some cells in the Lamp5 Fam19al1 
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Pax6 t-type also showed this firing pattern (fig. 
S7D), likely the same subset that aligns tran- 
scriptomically to the human PAX6 subclass 
(Fig. 1C). Mouse doublet-firing cells also showed 
a depolarizing “hump” for current injection just 
below rheobase (Fig. 5D), which together with the 
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marker-gene signature (Ndnf /Vip /Chrna7’) 
identifies them as mouse a7 cells, a type pre- 
viously defined in mouse by these physiological 
and gene features (19). This hump was sug- 
gested to indicate activation of T-type calcium 
channels, likely the same mechanism underlying 
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the bursting in human cells (45). Bursting was 
also previously noted in a subset of mouse 
single-bouquet cells (SBCs) (25), a group defined 
by loose morphological criteria. This class likely 
overlaps with the doublet-firing t-types (46), 
suggesting that they may burst under different 
physiological conditions. 

Using these insights from the cross-species 
alignment, we explored the expression of re- 
lated genes in the human PAX6 t-types (Fig. 
5C). Both t-types matched the a7 marker gene 
signature (along with the SST BAGE2 t-type; 
Ndnf /Vip /Chrna7‘) and strongly expressed 
the T-type calcium channel o subunit gene 
CACNAIG, highlighting T-type calcium channels 
as a potential factor in the burst and doublet 
firing across species (fig. S7A). Given the lack 
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of bursting in PAX6 CDH12 cells, other ion- 
channel genes differentially expressed between 
the two human PAX6 t-types likely also play a 
role, including TRPC3, a nonspecific cation 
channel that can regulate resting membrane 
potential (47). 


Cross-modiality relationships of L1 subclasses 
and t-types 


Given the multiple observations of distinct- 
ness between human types in contrast with 
continuous variation between mouse types, 
we explored this difference more comprehen- 
sively by defining a common quantitative 
framework for distinctness across modalities. 
We generalized the d’ metric used for tran- 
scriptomic distinctness (23, 24), quantifying 
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the performance of classifiers trained to dis- 
tinguish pairs of t-types on the basis of electro- 
physiological and morphological features. The 
resulting t-type similarity matrices (Fig. 6A) 
showed comparable subclass structure in both 
electrophysiology and transcriptomics, with 
smaller d’ values within subclass blocks and 
higher values outside. Of note is that d’ metrics 
were highly correlated between modalities, 
demonstrating that cell types with distinctive 
transcriptomes have similarly distinctive elec- 
trophysiological properties (Pearson’s 7 = 0.59, 
P = 0.00016) (Fig. 6B). The single within- 
subclass pair with a high d’ was LAMP5 NMBR 
and LAMP5 LCP2, which sit at opposite ends 
of the LAMP5 continuum. We also calculated 
d’ similarity matrices at the subclass level to 
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Fig. 5. Burst spiking PAX6 TNFAIP8L3 cells. (A) Reconstructed morphologies 
and example electrophysiology for bursting and nonbursting PAX6 t-types 
(TNFAIP8L3 and CDH12) (hyperpolarization near -100 mV, depolarization below 
rheobase, spiking at rheobase and rheobase +40 pA; scale bar, 0.5 s/10 mV). 
(Inset) UMAP projection of transcriptomic data from PAX6 subclass is shown. 
(B) Electrophysiological and morphological features distinguishing PAX6 TNFAIP8L3 
t-type (FDR-corrected MW test versus rest of L1, P < 0.05 for morphology, 

P < 0.01 for electrophysiology). Boxplots show statistics of PAX6 subtypes and 
other subclasses (box marks quartiles; whiskers extend 1.5xlIQR past box). 
Significant pairwise comparisons (to PAX6 TNFAIP8L3 only) are marked by 
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horizontal lines above (FDR-corrected P < 0.05, Dunn's test post hoc to KW test). 
(C) Gene expression of human PAX6 subclass (highlighted) and other L1 t-types, 
for a7 type and bursting-related marker genes (snRNA-seq). Violin plots show 
expression in log(CPM+1), normalized by gene (maximal expression noted at 
right). (D) (Left) Characterization of mouse L1 cells with initial doublet firing in 
terms of the human PAX6 TNFAIP8L3 type’s distinct properties. (Right) Example 
morphology and electrophysiology shown from PAX6 subclass (Lamp5 Krt73 
t-type). Depolarizing sag ratio is the normalized size of the hump at stimulus 
onset just below rheobase [electrophysiology traces as in (A); boxplots as in (B), 
all pairwise comparisons tested]. 
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allow comparison between species in all three 
modalities (Fig. 6C). These results confirmed 
the generally higher transcriptomic distinct- 
ness of subclasses in human and show that 
in mouse, the VIP subclass was highly distinct 
in all modalities, with other subclasses gener- 
ally less distinct. 

To visualize the subclass-level distinctness 
in terms of specific electrophysiological features, 
we found the pair of features that most dis- 
tinguished each subclass and showed that 
clusters defined by these features closely match 
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Fig. 6. Quantifying distinctness of L1 t-types and cross-modality structure. 
(A) Pairwise distinctness of human L1 t-types, from classifiers using 
electrophysiology (left) and gene expression (right). d' (d prime) is a metric of 
separation of distribution means, scaled relative to the SD. Groups with N < 4 are 
excluded (hatched area). (B) Correlation of pairwise d’ values between 
transcriptomic and electrophysiological feature spaces. Pearson's r = 0.59: 

P = 0.00016; shading shows bootstrapped 95% confidence interval of regression. 
Within-subclass pairs are shown in orange to confirm subclass structure. 
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the transcriptomic subclass boundaries (Fig. 
6D). We also tested the effectiveness of assign- 
ing subclass labels to neurons based on the full 
electrophysiological feature set. A multiclass 
classifier evaluated by cross-validation on the 
primary dataset had 82% accuracy balanced 
across subclasses (fig. S8A). To mimic the out- 
of-sample issues that could be encountered 
for future L1 datasets collected under different 
conditions, we also tested classifier perform- 
ance on data held out of our primary analysis 
because of equipment and protocol differences. 
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After excluding features for which the distrib- 
utions strongly differed from the primary 
dataset, we found comparable classification 
performance (81%, fig. S8B), reinforcing the 
utility of the human L1 subclasses for under- 
standing L1 variability even in the absence of 
transcriptomic information to assign subclass 
identity. 


Discussion 


Using Patch-seq, we identified a coherent view 
of human L1 interneurons in which neuronal 
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(C) Pairwise distinctness d’ of Ll subclasses across species and data modality. 
Groups with N < 10 are excluded. (D) Clustering of human L1 cells in 
electrophysiology subspaces, with correspondence to L1 subclasses. Points show 
all LI neurons, with the subclass of interest in color. Background color shows 
cluster membership likelihoods from two-cluster Gaussian mixture model trained 
on unlabeled data. Fl scores: LAMP5, 0.81; MC4R, 0.69; PAX6, 0.89; all others, 
0.5 (L1 VIP and ungrouped t-types). All features are normalized and Yeo-Johnson 
transformed to approximate Gaussian distribution. 
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subclasses defined by transcriptomic distinct- 
ness have similarly distinct morphoelectric 
phenotypes. Although mouse LI neurons had 
a similar range of diversity in most features, 
the features that distinguished cell types were 
different between species, and human LI 
neurons spanned a wider range of sizes. In 
addition to cell types not found in mouse LI 
(VIP PCDH20 and SST BAGE2), two human 
cell types emerged with especially distinct 
phenotypes that were not matched in their 
putative homologs in mouse: the compact, high- 
sag MC4R rosehip subclass and the large, burst- 
spiking PAX6 TNFAIPL83 t-type. Human and 
mouse neurons also showed consistent differ- 
ences in certain morphological and physiologi- 
cal properties across all subclasses, despite a 
general similarity in cell size. These results 
indicate a general conservation of L1 inhibitory 
neuron diversity but with distinct specializa- 
tions in cell properties and subclass and cell- 
type proportions, likely leading to differences 
in the regulation of higher-order input to the 
human cortical circuit. 


Categorizing L1 neuron types 


We provide support for previous classification 
schemes in mouse consisting of four primary 
types in LI but also clarify a need for precise 
data-driven criteria for those types. Homology- 
driven subclasses were nearly aligned with cell- 
type classifications based on single marker 
genes (19), but this alignment was often ambi- 
guous and lacked coherence across modalities. 
Similarly, other coarse single-modality cell-type 
distinctions in L1 (late spiking versus non late 
spiking, NGFC versus SBC) likely grouped mul- 
tiple distinct subclasses and shifted the exact 
boundaries (48). Our results suggest that a 
larger role for continuous variability should 
be considered when studying cell-type diversity 
in mouse LI. A continuous transition between 
Ndnf'/Npy* NGFC cells and Ndnf*/Npy™ 
canopy cells was previously noted (24), corres- 
ponding to similarity between the Lamp5 Ntn1 
Npy2r t-type (LAMP subclass) and Lamp5 
Fam19al1 Tmem182 (MC4R subclass). We also 
observed continuity between mouse PAX6 
(partial a7 cell match) and MC4R (partial 
canopy-cell match) types, with «7-like doublet 
spiking in some MCAR cells. Additionally com- 
plicating the view of canopy cells, the best-match 
cells (MC4R subclass) had some properties at 
odds with the original definition, expressing 
Chrna7 and missing the canopy-like Lia axons, 
which were observed primarily in Npy~ LAMP5 
cells (fig. S7, A to C). These ambiguous subclass 
boundaries are a strong contrast to the clear 
cross-modality subclass distinctions in human 
LI, perhaps related to the smaller number of 
well-resolved transcriptomic types in mouse. 
We also demonstrate the benefits of detailed 
transcriptomic data over small sets of genetic 
markers for both accurately characterizing cell- 
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type divisions and establishing cross-species 
homologies that facilitate comparative analy- 
sis. Within species, reliance on marker genes 
can overstate the distinctness of cell types, as 
with the NGFC/canopy distinction, or even lead 
to misidentification, as with the original des- 
cription of human LI rosehip cells, which were 
assigned an incorrect t-type on the basis of 
observed marker-gene patterns (20). Across 
species, the lack of conserved L1 markers was 
striking and perhaps exclusive to L1. Even using 
the full transcriptome, previous work on this 
homology found variable results for L1 types 
with different brain areas and methods (27). 
In this work, we chose to quantify similarity 
of t-types across species in a way that better 
captured ambiguity, finding strong matches 
for some types and weaker for others. In ge- 
neral, all transcriptomic homology matches 
were supported or contradicted by morpho- 
electric comparisons. Ambiguous matches may 
indicate areas of evolutionary change that 
could be illuminated by comparative or develop- 
mental analyses of additional species phyloge- 
netically related to mouse and human. For 
instance, the weak cross-species transcripto- 
mic similarity of the MC4R subclass (Fig. 1C) 
and lack of phenotypic similarity together sug- 
gest that human and mouse MC4R cells could 
represent distinct innovations in each taxon, 
rather than a true homology. 

In human LI, the strong alignment of sub- 
class distinctions across modalities suggests 
that cells can be classified by using only mor- 
phology or electrophysiology with reasonable 
accuracy. Condition-dependent variation of 
certain electrophysiological features can present 
a challenge to this approach, however, especially 
for classifications that rely on small numbers 
of features with especially strong qualitative 
variability. We failed to observe two electro- 
physiological phenotypes that had been noted 
in past work: late spiking in human NGFCs 
(40) and full bursting in mouse SBCs (25), 
qualitative features for which conflicting ob- 
servations have also been reported (9, 49). 
Although potential contributing factors are 
numerous (age differences; donor character- 
istics; and recording conditions, including in- 
ternal solutions, temperature, and equipment 
differences), we showed that for classification 
with a large feature set, identifying and ex- 
cluding affected features can rescue reliable 
performance. 


Cell types, evolution, and function 


We propose that the divergence in the L1 in- 
terneuron repertoire between mouse and hu- 
man partly reflects the increasingly complex 
role of L2/3 in the primate neocortical circuit 
(29, 50, 51); new types of pyramidal cells might 
have necessitated new types of dendritic 
inhibition. Compared with mouse, human L2/3 
excitatory neurons are more transcriptomi- 
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cally distinct and show larger sublaminar 
distinctions in gene expression, dendritic 
morphology, and physiology. As in LI, there 
are also neuronal types in L2/3 of human MTG 
with no clear homolog in L2/3 of mouse neo- 
cortex (29). Although these observations sug- 
gest similar degrees of evolutionary divergence 
in L2/3 excitatory neurons and LI inhibitory 
interneurons, the L1 circuit might also have 
adapted to changes in deep-layer pyramidal- 
neuron populations, including the decreased 
proportions in primates of L5 extratelence- 
phalic pyramidal neurons, the prominent apical 
dendrites of which are targeted by L1 inhibi- 
tion in rodents (2/, 22, 52). 

Two types of interneurons stood out as 
especially distinct within human LI, with the 
potential to contribute to functional differ- 
ences between human and mouse: the MC4R 
rosehip cells and the PAX6 TNFAIP8L3 t-type. 
Rosehip cells were previously shown to inhibit 
pyramidal-cell apical-dendrite shafts in L2/3 
(20); the rosehip subtypes, with distinct electro- 
physiology, could plausibly perform similar 
but distinct inhibitory functions or selectively 
modulate different pyramidal-neuron subtypes 
in L2/3 (29). For example, the irregular firing 
of the MC4R rosehip t-type suggests that this 
cell type is modulated by input in the beta 
frequency band (20), whereas the regular firing 
of the CHRNA4 rosehip t-type suggests a lack of 
beta-band modulation. The strong bursting 
dynamics and distinctive morphology of the 
PAX6 TNFAIP8L3 t-type also clearly point to 
a distinct functional role compared with that 
of neurons in neighboring subclasses or even 
the more closely related PAX6 CDH12 t-type. 
Their extended dendrites are well positioned 
to integrate local pyramidal-cell inputs across 
a broad spatial footprint and long-range axonal 
inputs across topographic boundaries, and the 
bursting would provide a strong immediate 
activation in response to strong or coinciden- 
tal input. The clear identification of a cross- 
species homology for the PAX6 subclass can 
help in deciphering its function, combining 
functional insights gained from manipulating 
mouse cells with indirect insights from the 
more distinctive morphologies of human cells. 
Conflicting connectivity patterns have been 
observed for coarser cell types that likely in- 
clude some PAX6 cells: Mouse «7 cells synapse 
onto nearby L2 pyramidal neurons (8), whereas 
rat SBCs synapse onto L2/3 interneurons (53). 
Thus, more focused investigation of PAX6 
cell connectivity is needed to illuminate the 
function of this subclass, and in turn, the func- 
tional implications of specialization within 
this subclass in human L1. 

In addition to these distinct types that we 
identified in human, we observed several 
subclass-independent cross-species differences 
in morphoelectric properties between species, 
which likely have functional consequences. 
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Forms of morphoelectric variability found in 
human and not mouse L1 include sag and spike 
adaptation properties. These vary both between 
and within subclasses in human and may con- 
tribute to differences in the spectral selectivity 
of the L1 circuit. For example, the higher-voltage 
sag response in human LI (and variation be- 
tween LAMP5 subtypes) may contribute to dif- 
ferences in the temporal summation of synaptic 
input, as in L2/3 pyramidal neurons (54, 55). The 
separation of dendritic and axonal arbors in 
human LI cells, which is not found in mouse, 
indicates the possibility of more sublaminar 
structure in Ll microcircuits in human. Simi- 
larly, given the sublaminar selectivity of thal- 
amic projections to Vip" versus Ndnf' cells in 
mouse L1 (56), the human LI VIP types, which 
are located in deeper layers in mouse, may be 
especially relevant to thalamocortical micro- 
circuit structure. The higher input resistance 
and lower rheobase in human imply increased 
sensitivity to low-intensity synaptic input, 
which could have direct computational func- 
tions or help compensate for circuit differences, 
such as changes in excitatory to inhibitory cell 
ratios between mouse and human (2], 22, 32). 
In particular, these cross-species morphoelec- 
tric differences may indicate varying demands 
on the NGFC subclass, which was otherwise 
largely conserved across species. NGFC GABAergic 
(both synaptic and extrasynaptic) and gap junction- 
mediated transmission depend critically on the 
spatial properties of the axonal arbor (37, 57, 58), 
but the increase in arbor size in human LAMP5 
cells (~1.2x) was much less than the 1.6~x in- 
crease in pyramidal-cell apical dendrite extent 
in LI. Changes in input sensitivity could enable 
the conservation of their “blanket” inhibitory 
function while also permitting some increased 
spatial and topographic selectivity; NGFC circuit 
connectivity has been shown to be both tightly 
controlled (59, 60) and to exert strong effects on 
pyramidal-cell sensory processing (J, 6). 

Much experimental evidence has documented 
the importance of neuromodulatory control on 
LI function (40, 61-63). Linking distinct sub- 
classes in L1 to detailed gene expression data 
(including emerging spatial transcriptomics 
results) can lead to refined hypotheses for 
cell-type specific neuromodulation. In particu- 
lar, human MC4R cells distinctly expressed 
several modulatory receptor genes (Fig. S9A), 
including the melanocortin receptor MC4R, 
which plays a role in energy homeostasis in 
hypothalamus (64), serotonin receptor H7TRIF, 
and metabotropic glutamate receptor GRM1 
(along with differential expression of GRM5 
between MCA4R t-types). Cholinergic activation 
of L1 cells (67), suggested to control attention, 
may also differentially modulate MC4R cells 
relative to other L1 types as a result of their 
stronger CHRNA6 and CHRNA4 expression. 

Our findings have a few limitations related 
to studying human cell types with the Patch-seq 
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approach. Certain cell types were undersampled, 
including those with no mouse homolog in LI 
(VIP PCDH20 and SST BAGE2), limiting our 
ability to fully characterize their properties. 
Future work using viral genetic labeling ap- 
proaches to label difficult-to-target types could 
help in this regard (65). Our cross-species com- 
parisons were limited to differences between 
mouse and human and thus could not elucidate 
evolutionary progression of the L1 interneuron 
repertoire or truly reveal human specializations. 
Additional investigations of L1 cell types in other 
species would help, whether based on single- 
cell transcriptomics alone or full characteriza- 
tion through Patch-seq. Even in mouse, our 
analysis focused on comparisons to human and 
thus did not fully resolve different views of 
mouse LI subclasses. A definitive answer may 
require a more comprehensive Patch-seq dataset, 
analyzed from a cross-species perspective and 
perhaps in concert with multiregion snRNA-seq 
and spatial transcriptomics (66). Lastly, directly 
relating any differences in intrinsic physiology, 
morphology, or gene expression to functional 
consequences requires further investigation, 
including characterizing synaptic connectivity 
and neuromodulation. 

Nonetheless, the comprehensive human Ll 
Patch-seq dataset reported here has generated 
a wealth of hypotheses to inspire future work, 
along with tools to support that work. Our 
analysis provides tools for the classification of 
LI diversity in both human and mouse, insights 
into functional relationships underlying physio- 
logical differences, and the clear identification 
of subclasses and subtypes that are likely to be 
of particular interest in functional studies—all 
important steps for deciphering the function 
of this enigmatic layer of neocortex. These 
data may also offer insight into various disease 
states, considering the importance of inhibi- 
tion in circuit dysfunction (67, 68). This ap- 
proach represents a roadmap for annotating 
functionally related properties onto transcrip- 
tomically defined cell-type taxonomies that 
could be applied across the primate brain—a 
crucial step toward linking cell-type diversity 
to functional diversity within a neural circuit. 


Materials and methods 


Detailed descriptions of Patch-seq data collection 
methods in the form of technical white papers 
can also be found under “Documentation” at 
http://celltypes.brain-map.org. 


Human tissue acquisition 


Surgical specimens were obtained from local 
hospitals (Seattle - Harborview Medical Center, 
Swedish Medical Center and University of 
Washington Medical Center; Amsterdam - Vrije 
Universiteit Medical Center; and Szeged - 
Department of Neurosurgery, University of 
Szeged) in collaboration with local neuro- 
surgeons. Data included in this study were 
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obtained from neurosurgical tissue resections 
for the treatment of refractory temporal lobe 
epilepsy, hydrocephalus, or deep-brain tumor 
(data S1). All patients provided informed con- 
sent, and experimental procedures were 
approved by hospital institute review boards 
before commencing the study (IRBs 1111798 
and 49119). Tissue was placed in slicing arti- 
ficial cerebral spinal fluid (ACSF) as soon as 
possible following resection. Slicing ACSF com- 
prised (in mM) 92 N-methyl-D-glucamine 
chloride (NMDG-C]), 2.5 KCl, 1.2 NaH»POx,, 30 
NaHC0Os, 20 4-(2-hydroxyethyl)-1-piperazinee- 
thanesulfonic acid (HEPES), 25 D-glucose, 2 
thiourea, 5 sodium-L-ascorbate, 3 sodium pyru- 
vate, 0.5 CaCl,.4H.,O, and 10 MgSO,.7H.O. 
Before use, the solution was equilibrated with 
95% O. and 5% CO., and the pH was adjusted 
to 7.3 to 7.4 by addition of 5N HCl solution. 
Osmolality was verified to be between 295 and 
310 mOsm kg-1. Human surgical tissue speci- 
mens were immediately transported (10 to 
35 min) from the hospital site to the labora- 
tory for further processing. 


Mouse breeding and husbandry 


All procedures were carried out in accordance 
with the Institutional Animal Care and Use 
Committee at the Allen Institute for Brain 
Science or Vrije Universiteit. Animals (<5 mice 
per cage) were provided food and water ad 
libitum and were maintained on a regular 
12-hour light:dark cycle; rooms were kept at 
21.1°C and 45 to 70% humidity. Mice were 
maintained on the C57BL/6J background 
(RRID:IMSR_JAX:000664), and newly received 
or generated transgenic lines were backcrossed. 
to C57BL/6J. Experimental animals were hetero- 
zygous for the recombinase transgenes and the 
reporter transgenes. For details on transgenic 
lines, age, or other details see data SI. 


Tissue processing 


Data were obtained from male and female mice 
between the ages of postnatal day (P)}45 and 
P70. Mice were anaesthetized with 5% isoflur- 
ane and intracardially perfused with 25 or 
50 ml of 0° to 4°C slicing ACSF. Human or 
mouse acute brain slices (350 um) were prepared 
with a Compresstome VF-300 (Precisionary 
Instruments) or VT1200S (Leica Biosystems) 
vibrating blade microtome modified for block- 
face image acquisition (Mako G125B PoE camera 
with custom integrated software) before each 
section to aid in registration to the common 
reference atlas. 

Slices were transferred to a carbogenated 
(95% O./5% COs) and warmed (34°C) slicing 
ACSF for 10 min, then transferred to room 
temperature holding ACSF of the composition 
(72) (in mM): 92 NaCl, 2.5 KCl, 1.2 NaH.PO,, 
30 NaHCOs, 20 HEPES, 25 D-glucose, 2 
thiourea, 5 sodium-L-ascorbate, 3 sodium 
pyruvate, 2 CaCl5.4H,O and 2 MgSO,4.7H,O 
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for the remainder of the day until trans- 
ferred for patch-clamp recordings. Before use, 
the solution was equilibrated with 95% O. and 
5% COs, and the pH was adjusted to 7.3 using 
NaOH. Osmolality was verified to be between 
295 and 310 mOsm kg". 


Patch-clamp recording 


A step-by-step protocol for the Patch-seq methods 
used in this study can be found at https://www. 
protocols.io/view/patch-seq-recording-and- 
extraction-detailed-protoc-5qpvoyb2b¢40/v2. 
Slices were continuously perfused (2 ml/min) 
with fresh, warm (32 to 34°C) recording ACSF 
containing the following Gin mM): 126 NaCl, 
2.5 KCl, 1.25 NaH»PO,, 26 NaHCOs, 12.5 D-glucose, 
2 CaCly.4:H,O, and 2 MgSO,.7H,O (pH 7.3), con- 
tinuously bubbled with 95% O. and 5% COs. 
The bath solution contained blockers of fast 
glutamatergic (1 mM kynurenic acid) and 
GABAergic synaptic transmission (0.1 mM 
picrotoxin). Thick-walled borosilicate glass 
(Warner Instruments, G150F-3) electrodes were 
manufactured (Narishige PC-10 or Sutter Instru- 
ments P-87) with a resistance of 4 to 5 MQ. 
Before recording, the electrodes were filled 
with ~1.0 to 2.0 ul of internal solution [110 mM 
potassium gluconate, 10.0 mM HEPES, 0.2 mM 
ethylene glycol-bis (2-aminoethylether)-N,N,N’, 
N’-tetraacetic acid, 4 mM potassium chloride, 
0.3 mM guanosine 5’-triphosphate sodium salt 
hydrate, 10 mM phosphocreatine disodium salt 
hydrate, 1 mM adenosine 5’-triphosphate mag- 
nesium salt, 20 ng ml glycogen, 0.5 U ul 
RNAse inhibitor (Takara, 2313A), 0.02 Alexa 
594 or 488, and 0.5% biocytin (Sigma B4261), pH 
7.3]. The pipette was mounted on a Multiclamp 
700B amplifier headstage (Molecular Devices) 
fixed to a micromanipulator (PatchStar, Scien- 
tifica or Mini25, Luigs and Neumann). 

Electrophysiology signals were recorded using 
an ITC-18 Data Acquisition Interface (HEKA). 
Commands were generated, signals were pro- 
cessed, and amplifier metadata were acquired 
using MIES (https://github.com/AllenInstitute/ 
MIES/), written in Igor Pro (Wavemetrics, 
RRID:SCR_000325). Data were filtered (Bessel) 
at 10 kHz and digitized at 50 kHz. Data were 
reported uncorrected for the measured -14-mV 
liquid-junction potential between the electrode 
and bath solutions. 

Before data collection, all surfaces, equip- 
ment, and materials were thoroughly cleaned 
in the following manner: a wipe down with 
DNA away (Thermo Scientific), RNAse Zap 
(Sigma-Aldrich) and finally with nuclease- 
free water. 

LI was identifiable in mouse and human 
brain slices as the neuron sparse region directly 
between the pial surface and the neuron-dense 
layer 2/3. Neurons within L1 were targeted for 
patch-clamp recordings. 

After formation of a stable seal and break-in, 
the resting membrane potential of the neuron 
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was recorded (typically within the first minute). 
A bias current was injected, either manually or 
automatically using algorithms within the 
MIES data acquisition package, for the re- 
mainder of the experiment to maintain that 
initial resting membrane potential. Bias cur- 
rents remained stable for a minimum of 1 s 
before each stimulus current injection. Upon 
attaining whole-cell current clamp mode, the 
pipette capacitance was compensated and the 
bridge was balanced. 

The voltage response of each cell was recorded 
in response to a standardized stimulus para- 
digm described previously (26) that included 
square pulses, ramps, and chirps, with the goal 
of extracting features that could be compared 
across cells, rather than tailoring each stimu- 
lus to the physiological input of that neuron. 


Nucleus extraction 


Upon completion of electrophysiological exa- 
mination, the pipette was centered on the 
soma or placed near the nucleus (if visible). 
A small amount of negative pressure was applied 
(~-30 mbar) to begin cytosol extraction and 
attract the nucleus to the tip of pipette. After 
approximately 1 min, the soma had visibly 
shrunk and/or the nucleus was near the tip of 
the pipette. While maintaining the negative 
pressure, the pipette was slowly retracted dia- 
gonally in the @ and g direction. Slow, continuous 
movement was maintained while monitoring 
the pipette seal. Once the pipette seal reached 
>1 GQ. and the nucleus was visible on the tip of 
the pipette, the speed was increased to remove 
the pipette from the slice. The pipette con- 
taining internal solution, cytosol, and nucleus 
was removed from the pipette holder, and the 
contents were expelled into a PCR tube con- 
taining the lysis buffer (Takara, 634894). 


Voltage clamp experiments 


For a subset of experiments with a high nu- 
cleated patch resistance (>1000 MQ), we mea- 
sured macroscopic outward ionic currents in 
voltage clamp. To reduce capacitive artifacts, 
the pipette containing the nucleus was raised 
to the upper portion of the bath. For K* chan- 
nels, activation curves were constructed from 
1-s depolarizing voltage commands (-50 to 
+70 mV in 10-mV voltage steps) from a hold- 
ing potential of -—90 mV. Linear leakage and 
capacitive currents were digitally subtracted 
by scaling traces at smaller command volt- 
ages in which no voltage-dependent current 
was activated. To isolate K* currents into their 
phenomenological components (A-type, D-type, 
noninactivating), we exploited the voltage de- 
pendent properties of the putative channels 
underlying each component (73). The fast- 
inactivating component (/x,4) was inactivated 
by a brief step to -20 mV followed by a series 
of 1-s voltage steps ranging from —50 to 70 mV 
in 10-mV increments. J, was obtained by digi- 
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tally subtracting the resultant current from the 
total current, post hoc. Step depolarization to 
70 mV from a holding potential of -20 mV was 
used to inactivate all K* current and revealed a 
sustained current. Subtracting the sustained cur- 
rent from the current used to isolate Ix, revealed 
a slowly inactivating, D-type Up) current. Peak 
currents were calculated for each voltage step. 
Conductance values were calculated based on 
the recorded membrane potentials and a K™ 
reversal potential at -100 mV. The surface 
area of the nucleated patch was calculated to 
obtain current and conductance densities. 


Quality control 


For an individual sweep to be included in anal- 
ysis, the following criteria were applied: (1) 
membrane potential within 2 mV of target 
potential (initial resting potential of cell); (2) 
bias (leak) current 0 + 100 pA; and (3) root 
mean square noise measurements in a short 
window (1.5 ms, to gauge high-frequency noise) 
and longer window (500 ms, to measure patch 
instability) <0.2 mV and 0.5 mV, respectively. 
For human electrophysiology in the primary 
dataset, QC filters were also imposed at the 
cell level to flag cells with >1 GQ seal recorded 
before break-in, initial access resistance <1 or 
>20 MQ or >25% of the input resistance. Cell 
recordings failing these tests were manually 
examined for recording quality and manually 
passed or failed. Cells also had to have features 
successfully extracted for long square pulse 
sweeps at a minimum to be included in analysis. 
For mouse VISp cells, slightly stricter automated 
QC values were imposed at the sweep and the 
cell level, following the original publication. 


Transcriptomic data collection 
CDNA amplification and library 
construction 


We performed all steps of RNA-processing and 
sequencing as described in our previous hu- 
man Patch-seq studies (22, 26, 29, 52). We used 
the SMART-Seq v4 Ultra Low Input RNA Kit for 
Sequencing (Takara, 634894) to reverse tran- 
scribe poly(A) RNA and amplify full-length cDNA 
according to the manufacturer’s instructions. 
We performed reverse transcription and cDNA 
amplification for 20 PCR cycles in 0.65-ml tubes, 
in sets of 88 tubes at a time. At least 1 control 
8-strip was used per amplification set, which 
contained four wells without cells and four wells 
with 10 pg control RNA. Control RNA was either 
Universal Human RNA (UHR) (Takara 636538) 
or control RNA provided in the SMART-Seq v4 
kit. All samples proceeded through Nextera 
XT DNA Library Preparation (lumina FC-131- 
1096) using either Nextera XT Index Kit V2 Sets 
A to D(FC-131-2001,2002,2003,2004) or custom 
dual-indexes provided by Integrated DNA 5. 
Technologies (IDT). Nextera XT DNA Library 
prep was performed according to manufac- 
turer’s instructions except that the volumes of 
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all reagents including cDNA input were de- 
creased to 0.2x by volume. Each sample was 
sequenced to approximately 1 million reads. 


RNA-seg data processing 


Fifty-base-pair paired-end reads were aligned 
to GRCh38.p2 using a RefSeq annotation eff 
file retrieved from NCBI on 11 December 2015 
for human and to GRCm38 (mm10) using a 
RefSeq annotation eff file retrieved from NCBI 
on 18 January 2016 for mouse (https://www. 
ncbi.nlm.nih.gov/genome/annotation_euk/all/). 
Sequence alignment was performed using STAR 
(v2.5.3, RRID:SCR_004463) (74) in two-pass 
Mode. PCR duplicates were masked and re- 
moved using STAR option bamRemoveDu- 
plicates. Only uniquely aligned reads were used 
for gene quantification. Gene counts were com- 
puted using the R Genomic Alignments package 
summarizeOverlaps function using Intersection- 
NotEmpty mode for exonic and intronic regions 
separately (75). Expression levels were calculated 
as counts of exonic plus intronic reads. For most 
analyses, log,(counts per million (CPM) + 1)- 
transformed values were used, or CPM in the 
case of Seurat or d’ analyses. 


Anatomical annotations 
Layer annotation and alignment 


To characterize the position of biocytin-labeled 
cells, a 20x brightfield and fluorescent im- 
age of DAPI (4’,6-diamidino-2-phenylindole) 
stained tissue was captured and analyzed to 
determine layer position (human and mouse) 
and region (mouse only). Using the brightfield 
and DAPI image, soma position and laminar 
borders were manually drawn for all human 
neurons and reconstructed mouse cells and 
were used to calculate depth relative to the 
pia, white matter, and/or laminar boundaries. 
The border between L1 and L2 in particular 
was visually identified as a characteristic sharp 
increase in cell-body size and density as well 
as the presence of clear pyramidal-shaped cell 
bodies in L2. Laminar locations were calcu- 
lated by finding the path connecting pia and 
white matter that passed through the cell’s 
soma coordinate and measuring distance along 
this path to laminar boundaries, pia, and white 
matter. For mouse cells without reconstruc- 
tions, pia and white matter boundaries from 
the CCF were used as references, with layer 
positions calculated by aligning the relative 
cortical depth to an average set of layer thick- 
nesses. Because laminar borders are not hard 
boundaries obeyed by cell types or the mor- 
phoelectric properties of those cell types, we 
elected to include border cells (cells within 
40 um of the border) in the analysis of Patch- 
seq data. 

For reconstructed neurons, laminar depths 
were calculated for all segments of the mor- 
phology, and these depths were used to create 
a “layer-aligned” morphology by first rotating 
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the pia-to-WM axis to vertical, then projecting 
the normalized laminar depth of each segment 
onto an average cortical-layer template. 


CCF pinning and alignment 


Mouse cells were individually manually placed 
in the appropriate cortical region and layer 
within the Allen Mouse Common Coordinate 
Framework (CCF) (76) by matching the 20x 
image of the slice with a “virtual” slice at an 
appropriate location and orientation within 
the CCF. 


Human brain region pinning 


Available surgical photodocumentation [mag- 
netic resonance imaging (MRI) or brain model 
annotation] is used to place the human tissue 
blocks in approximate 3D space by match- 
ing the photodocumentation to a MRI refer- 
ence brain volume “ICBM 2009b Nonlinear 
Symmetric” (77), with Human CCF over- 
layed (78) within the ITK-SNAP interactive 
software. 


Morphological reconstruction 
Biocytin histology 


A horseradish peroxidase (HRP) enzyme reac- 
tion using diaminobenzidine (DAB) as the 
chromogen was used to visualize the filled 
cells after electrophysiological recording, and 
DAPI stain was used to identify cortical layers 
as described previously (29). 


Imaging of biocytin-labeled neurons 


Mounted sections were imaged as described 
previously (79). In brief, operators captured 
images on an upright AxioImager Z2 microscope 
(Zeiss, Germany) equipped with an Axiocam 506 
monochrome camera and 0.63x Optivar lens. 
Two-dimensional (2D) tiled overview images 
were captured with a 20x objective lens (Zeiss 
Plan-NEOFLUAR 20x/0.5) in brightfield trans- 
mission and fluorescence channels. Tiled image 
stacks of individual cells were acquired at 
higher resolution in the transmission channel 
only for the purpose of automated and manual 
reconstruction. Light was transmitted using 
an oil-immersion condenser (1.4 NA). High- 
resolution stacks were captured with a 63x 
objective lens (Zeiss Plan-Apochromat 63x/1.4 
Oil or Zeiss LD LCI Plan-Apochromat 63x/1.2 
Imm Corr) at an interval of 0.28 um (1.4 NA 
objective) or 0.44 um (1.2 NA objective) along 
the z axis. Tiled images were stitched in ZEN 
software (RRID:SCR_013672) and exported as 
single-plane TIFF files. 


Morphological reconstruction 


Reconstructions of the dendrites and the full 
axon were generated for a subset of neurons with 
good-quality transcriptomics, electrophysiology, 
and biocytin fill. Reconstructions were gen- 
erated based on a 3D image stack that was 
run through a Vaa3D-based image processing 
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and reconstruction pipeline (80). For some cells, 
images were used to generate an automated 
reconstruction of the neuron using TReMAP 
(81). Alternatively, initial reconstructions were 
created manually using the reconstruction 
software PyYKNOSSOS (https://www.ariadne.ai/) 
or the citizen neuroscience game Mozak (82) 
(https://www.mozak.science). Automated or 
manually initiated reconstructions were then 
extensively manually corrected and curated 
using a range of tools (for example, virtual finger 
and polyline) in the Mozak extension (Zoran 
Popovic, Center for Game Science, University of 
Washington) of Terafly tools (83, 84) in Vaa3D. 
Every attempt was made to generate a completely 
connected neuronal structure while remaining 
faithful to image data. If axonal processes could 
not be traced back to the main structure of the 
neuron, they were left unconnected. 


Slice immunohistochemistry 
Immunohistochemistry and slide imaging 


Tissue slices (350 um-thick) designated for 
histological profiling were fixed for 2 to 4 days 
in 4% paraformaldehyde (PFA) in phosphate- 
buffered saline (PBS) at 4°C and transferred to 
PBS, 0.1% sodium azide for storage at 4°C. For 
human samples, these slices were interspersed 
with Patch-seq slices when preparing each 
tissue block, while for mouse the histology slices 
were from separate tissue blocks prepared fol- 
lowing the same protocol. Slices were then 
cryoprotected in 30% sucrose, frozen and 
resectioned at 30 um (human tissue) and 20 um 
(mouse tissue) using a sliding microtome (Leica 
SM2000R). Sections were stored in PBS with 
azide at 4°C in preparation for immuno- 
histochemical staining. Staining for Neu-N 
(Neuronal nuclei; Millipore, MAB377, 1:2,000) 
with DAB was applied using the Biocare 
Intellipath FLX slide staining automated 
platform [as previously in (29)]. The images of 
stained subsections were acquired with 20x 
objective on Aperio microscope at a resolution 
of 1 um to 1 pixel (human slices) and 1 um to 
0.989 (mouse slices). Full immunohistology 
protocol details available at http://help.brain- 
map.org/download/attachments/8323525/ 
CellTypes_Morph_Overview.pdf?version=4& 
modification Date=1528310097913&api=v2 


Cytoarchitecture analysis 


Images of Neu-N stained sections in which 
somas were clearly distinguishable were sel- 
ected for further analysis [human MTG: N=76 
regions of interests (ROIs), 3016 cells, 19 slices 
from 16 cases, FCtx: N=23 ROIs, 1113 cells, 8 
slices from 8 cases; TeA: N= 29 ROIs, 400 cells, 
8 slices, 3 animals; VISp: N= 29 ROIs, 425 cells, 
8 slices, 3 animals]. The quantification of cell 
densities and cell body sizes was performed 
using custom MATLAB scripts (R2022a, Math- 
works, RRID:SCR_001622). Within each sub- 
section, several ROIs containing only L1 were 
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selected manually, each ROI a rectangle of 
500 to 700 um in length and the full extent of 
LI in height. The border between LI and L2 
was visually identified as a characteristic sharp 
increase in cell-body size and density as well as 
the presence of clear pyramidal-shaped cell 
bodies in L2. The MATLAB image-processing 
scripts identified the cells by binarizing the 
image and applying watershed transform. Cell 
area was extracted after applying the scaling 
factor (1 um/pixel for human and 0.989 um/ 
pixel for mouse). The cell densities were calcu- 
lated as the number of cells divided by the 
volume of tissue (the product of selected ROI 
area and tissue subsection thickness: 30 um 
for human tissue and 20 um for mouse tissue). 
Statistical tests were performed using Kruskal 
Wallis test with post hoc comparisons. 


Transcriptomic data analysis 
Reference data from dissociated cells 
and nuclei 


Reference transcriptomic data used in this 
study were obtained from dissociated inhibi- 
tory cells (mouse) or nuclei (human) collected 
from human MTG (27) and mouse VISp (46) 
and are publicly accessible at the Allen Brain 
Map data portal (https://portal.brain-map. 
org/atlases-and-data/rnaseq) and Transcrip- 
tomics Explorer (RRID:SCR_017567). Layer 
1 t-types were assessed by proportions in these 
datasets after first reducing sampling bias by 
selecting only samples with even dissections 
across all cortical layers and in mouse addi- 
tionally restricting to samples targeted by 
pan-neuronal or pan-GABAergic mouse lines. 
As previously described (27), 500-um thick sec- 
tions were stained for Nissl to permit visualiza- 
tion of layers. Layer dissections were made 
based on Nissl stain under a dissecting mi- 
croscope using a needle blade microknife. 


Definition of L1 types from reference data 


To accommodate imprecision in L1 dissections, 
we used a high cutoff for proportion of type in 
LI to be considered a LI type. L1 t-types were 
defined as types making up >5% of L1 cells or 
with >50% of the type found in LI dissections. 
For human t-types (with relatively unbiased 
Patch-seq sampling), we verified borderline 
t-types with the more-precise layer boundaries 
in Patch-seq data, excluding the VIP LBH type 
(<1% of L1, <25% in LI) and including the PAX6 
TNFAIP8L3 type (>1% of L1, >50% in LI). 

For further analysis of these reference trans- 
criptomic datasets, data was restricted to only 
LI and the adjacent layer (L2 in human, L2/3 
in mouse), to be as selective as possible while 
not risking the exclusion of L1/2 border cells in 
case of imprecise dissections. 


Discriminant analysis (d’) 


For measuring distinctness between types in 
transcriptomic feature space, we followed the 
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cross-validated negative binomial (NB) dis- 
criminant analysis from (23). For each pair of 
t-types, a set of cross-validated log-likelihood 
ratios (LLR) were calculated for each cell, 
fitting a NB classifier to the training split and 
measuring LLR for the test split across rounds 
of fivefold cross-validation. The classifier was a 
naive Bayes negative binomial model, with inde- 
pendent negative binomial distributions fit for 
each subset on each feature (gene) by maximum 
likelihood, with dispersion parameter set to 
r = 1 following the observed statistics of our 
dataset. 

This produced a distribution of likelihood 
ratios for each t-type in the pair, the separation 
of which was summarized by the d’ statistic. 
For normal distributions this is typically cal- 
culated as the separation of means divided 
by the standard deviation, but we instead used 
a nonparametric form (equivalent in the nor- 
mal distribution case): d’ = 20 — 1(AUC), where 
® is the CDF of the standard normal distrib- 
ution, and AUC is the area under the receiver 
operating characteristic curve for the classifier 
(equivalently, the proportion of pairs selected 
one from each type for which the LLR of the 
cluster 1 cell is higher than the cluster 2 cell). 

This method was adapted for other modali- 
ties (Patch-seq transcriptomics, morphology, 
and electrophysiology) by simply using differ- 
ent classifiers. For electrophysiology and mor- 
phology, we used a random forest classifier 
with scikit-learn default parameters and bal- 
anced class weighting. For Patch-seq trans- 
criptomics, we modified the naive Bayes negative 
binomial model to use a zero-inflated negative 
binomial distribution (statsmodels) (85). Given 
the high number of free parameters in this 
model, it was not directly suitable to fitting on 
the small datasets necessary for the pairwise 
discriminant analysis. Using ZINB fits to each 
gene across the full reference dataset, we 
observed that the parameters x (zero-inflation 
probability) and » (NB dispersion or shape 
parameter) both nearly followed a curve de- 
pending on yu, the NB distribution mean. We 
parametrized these curves (o with a spline fit, 
mt with a sigmoid), and used them to constrain 
ZINB fits for discriminant analysis, assuming 
that for all genes, zero-inflation parameters fol- 
low the same dependence on mean expression. 
Given these constraints, the maximum likeli- 
hood fit for each gene could be implemented 
simply as a lookup table. 


Subclasses and cross-species homology 


Human LI transcriptomic subclasses were de- 
fined based on the d’ values by grouping all 
pairs of t-types with d’ <2.2, equivalent to ap- 
proximately 2% overlap of LLR distributions. 
Four pairs of t-types were grouped in this way, 
forming three subclasses. 

We defined cross-species homology of Ll 
t-types following a variation of the procedure 
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in (27), using coordinates for each cell in a space 
integrating the mouse and human transcrip- 
tomic references (calculated in the original 
from ScAlign, 30 dimensions). Instead of relying 
on defining clusters in that integrated space and 
measuring overlap within those clusters, we 
directly defined a similarity metric for any two 
clusters in that space: the ratio of the mean 
intracluster difference to the mean intercluster 
difference. Intracluster difference was averaged 
over all pairs of cells within each cluster, then 
averaged over the two clusters; intercluster dif- 
ference was averaged directly over all inter- 
cluster pairs. We summarized this similarity 
metric for all mouse LI t-types aligned to 
human LI subclasses (Fig. 1) and to individual 
human L1 t-types (Fig S1). 


Patch-seq data curation and mapping 


Patch-seq samples were included in this data- 
set if they met the following transcriptomic 
quality criteria: a normalized sum of “on” type 
marker-gene expression (NMS) greater than 
0.4 and a normalized sum of nonneuronal 
contamination markers less than 2 (86). 

We mapped Patch-seq samples to reference 
taxonomies from the reference single cell/ 
nuclei RNA-sequencing datasets introduced 
above, consisting of a hierarchical dendrogram 
of cell types, with a subset of cells from the 
reference identified with each node of the tree 
and a set of marker genes defined to distin- 
guish types at each split in the tree (up to 50 
markers in each direction per branch point, 
limited to robust markers only as defined by 
select_markers_pair_group in the scrattch. 
hicat package). The Patch-seq transcriptomes 
were mapped to the reference taxonomy fol- 
lowing the “tree mapping” method (map_dend_- 
membership in the scrattch.hicat package). 
Briefly, at each branch point of the taxonomy 
we computed the correlation of the mapped 
cell’s gene expression with that of the refer- 
ence cells on each branch, using the markers 
associated with that branch point (i.e., the 
genes that best distinguished those groups in 
the reference) and chose the most correlated 
branch. The process was repeated until reach- 
ing the leaves of the taxonomy (t-types). To 
determine the confidence of mapping, we ap- 
plied 100 bootstrapped iterations at each branch 
point, and in each iteration, 70% of the reference 
cells and 70% of markers were randomly sam- 
pled for mapping. The percentage of times a 
cell was mapped to a given t-type was defined 
as the mapping probability, and the highest 
probability t-type was assigned as the mapped 
cell type. 

Only cells mapping to the identified Ll 
t-types were included in subsequent morpho- 
electric feature analysis. Neurons from non-L1 
t-types present in human LI were also in- 
cluded in the L1 proportion analysis, and those 
for which morphological reconstructions were 
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available were included in the supplemen- 
tary morphology gallery. Some additional qual- 
ity filters were applied to the mouse VIS cells 
only, following the procedure in their orig- 
inal publication: excluding cells with poor 
RNA amplification and “inconsistent” cells as 
defined by unexpected patterns of mapping 
probabilities. 


Joint visualization 


We visualized transcriptomic diversity using 
a nonlinear projection of a transcriptomic 
space following integration of each species’ 
Patch-seq and reference dissociated cell/nuclei 
datasets. We first excluded genes potentially 
related to technical variables: X and Y chro- 
mosome genes, mitochondrial genes [Human 
MitoCarta2.0], and genes most highly expressed 
in a nonneuronal cell type in the reference 
dataset. For human Patch-seq samples, which 
had more variable quality of transcriptomic 
data, we additionally excluded a small set of 
immune/glial activation-related genes that were 
shown to introduce non-cell-type-related var- 
iability (65, 87). 

This filtered gene set was loaded and pro- 
cessed by the Seurat pipeline (33): Expression 
values were first normalized by the SCTrans- 
form model, then the 3000 most variable genes 
were transformed by CCA and nonlinear warp- 
ing to integrate the Patch-seq and reference 
datasets (functions FindIntegrationAnchors 
and IntegrateData). For human Patch-seq 
samples only, the SCTransform normalization 
additionally reduced effects of contamination 
by regressing against the normalized contami- 
nation marker sum for each cell. The integrated 
space was then transformed by PCA (30 PCs) 
followed by UMAP projection (to two dimen- 
sions) for visualization. 


Electrophysiology feature analysis 


For all electrophysiology stimuli that elicited 
spiking, action potentials were detected by 
first identifying locations where the smoothed 
derivative of the membrane potential (dV/d?) 
exceeded 20 mV ms’, then refining on the 
basis of several criteria, including threshold- 
to-peak voltage, time differences, and abso- 
lute peak height. For each action potential, 
threshold, height, width (at half-height), fast 
after-hyperpolarization (AHP) and interspike 
trough were calculated (trough and AHP 
were measured relative to threshold), along 
with maximal upstroke and downstroke rates 
dV/dt and the upstroke/downstroke ratio 
(that is, ratio of the peak upstroke to peak 
downstroke). 

Following spike detection, summary features 
were calculated from sweeps with long square 
pulse current injection: input resistance (all 
hyperpolarizing sweeps, —10 to —90 pA), sag 
(hyperpolarizing sweep with response closest 
to -100 mV, generally -90 pA stimulus, and 
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depolarizing sag on subthreshold response 
closest to rheobase), rheobase, and f-I slope 
(all spiking sweeps, up to rheobase +80 pA). 
Spike-train properties were calculated for each 
spiking sweep: latency, average firing rate, 
initial instantaneous firing rate (inverse of first 
ISI), mean and median ISI, ISI CV, irregularity 
ratio, and adaptation index. These spike-train 
features and the single-spike properties listed 
above (measured on the first action potential) 
were summarized for both the rheobase sweep 
and a stimulus 40 pA above rheobase. For 
spike upstroke, downstroke, width, threshold, 
and interspike interval (ISI), “adaptation ratio” 
features were calculated as a ratio of the spike 
features between the first and third spike (on 
the lowest amplitude stimulus to elicit at least 
four spikes). 

Spike-shape properties were also calculated 
for short (3-ms) pulse stimulation and a slowly 
increasing current ramp stimulus (first spike 
only). A subset of cells also had subthreshold 
frequency response characterized by a logarith- 
mic chirp stimulus (sine wave with exponen- 
tially increasing frequency), for which the 
impedance profile was calculated and char- 
acterized by features including the peak fre- 
quency and peak ratio. Feature extraction was 
implemented using the IPFX python package 
(); custom code used for chirps and some high- 
level features will be released in a future 
version of IPFX. 


Morphology feature analysis 


Prior to morphological feature analysis, re- 
constructed neuronal morphologies were ex- 
panded in the dimension perpendicular to the 
cut surface to correct for shrinkage (88, 89) 
after tissue processing. The amount of shrink- 
age was calculated by comparing the distance 
of the soma to the cut surface during record- 
ing and after fixation and reconstruction. For 
mouse cells, a tilt-angle correction was also 
performed based on the estimated difference 
(via CCF registration) between the slicing 
angle and the direct pia-white matter direc- 
tion at the cell’s location (79). Features pre- 
dominantly determined by differences in the 
g-dimension were not analyzed to minimize 
technical artifacts due to g-compression of 
the slice after processing. 

Morphological features were calculated as 
previously described (79). In brief, feature de- 
finitions were collected from prior studies 
(90, 91). Features were calculated using the skel- 
eton keys (https://github.com/AllenInstitute/ 
skeleton_keys) and neuron morphology (https:// 
github.com/AllenInstitute/neuron_morphology) 
python packages. Features were extracted 
from neurons aligned in the direction per- 
pendicular to pia and white matter. Laminar 
axon distribution (bin size of 5 um) and earth 
movers distance features require a layer-aligned 
version of the morphology where node depths 
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are registered to an average interlaminar depth 
template. 


Morphoelectric modeling analysis 


To explore relationships between morphology 
and input resistance, biophysically detailed 
multicompartmental models were tested for all 
LAMPS5 cells in both species using the BMTK 
modeling framework (92) and the NEURON 
(RRID:SCR_005393) simulator. Models were 
instantiated for each morphology using a single 
set of parameters (leak and intracellular con- 
ductance, membrane resistance, ion channels, 
etc.) obtained from a previously optimized 
model (79) for a mouse NGFC cell (viewable at 
http://celltypes.brain-map.org/experiment/ 
electrophysiology/475585413). After an ini- 
tial transient to ensure models were stable at 
their resting membrane potential, input resist- 
ance was measured using the NEURON Impe- 
dance tool for linearized input resistance at the 
soma compartment. The resulting morphology- 
predicted input resistance values were re- 
gressed against measured values by species to 
assess how strongly morphologies contributed 
to observed variation in input resistance. 


Statistical analysis of variability 


Unless otherwise specified, statistical analyses 
were implemented in python using the stats- 
models package (85), and clustering and classi- 
fication methods were implemented using 
scikit-learn (93). Samples consisted of inde- 
pendent biological replicates (Patch-seq recorded 
neurons), with counts given in table S2. 


Variation by subclass and species 


To assess the variability of morphoelectric 
features by subclass within species, we used 
a one-way analysis of variance (ANOVA) on 
ranks [Kruskal-Wallis (KW) test] for each 
feature by subclass. Results were reported as 
fraction of variance explained (¢€2) and KW 
test P value. P values were corrected for false 
discovery rate (FDR, Benjamini-Hochberg 
procedure) across all features for each data 
modality. Post-hoc Dunn’s tests were run across 
all pairs of subclasses (excluding ungrouped 
t-types), and results were FDR-corrected. 
Analysis of feature relationships with other 
variables including cell depth, brain region, 
or donor characteristics were likewise assessed 
by Mann-Whitney (MW) tests for binary varia- 
bles, KW test for categorical, and Spearman’s or 
Pearson’s correlations for continuous variables, 
all FDR-corrected across features by modality. 
For cross-species analysis, samples were re- 
stricted to only cells present within Ll and 
belonging to one of the homologous subclasses 
in each species. Overall cross-species variation 
was assessed by a MW test for each feature, 
ranked by effect size 7, and FDR-corrected by 
modality. Subclass-dependence of these dif- 
ferences were assessed by a two-way ANOVA 
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on species and subclass (heteroskedasticity- 
corrected); where species-subclass interactions 
were found along with species differences, this 
was followed by post-hoc tests for species dif- 
ferences within each subclass (MW test). 


Clustering and classification 


Clustering and classification tasks required 
some preprocessing of electrophysiology data 
to deal with missing and outlier values: We per- 
mitted cells with partially incomplete recordings, 
for instance, to maximize the usage of available 
data. Preprocessing included outlier removal, 
data standardization, and imputation, after ex- 
cluding cells with more than 60% of electro- 
physiological features missing. Extreme outliers 
were removed first (LocalOutlierFactor < —20); 
for standardization, features were centered 
about the median and scaled by interquartile 
range (IQR) (RobustScaler); missing values 
were imputed as the mean of five nearest neigh- 
bors (KNNImputer). 

Following this preprocessing, electrophysiology 
and morphology classifiers were trained and 
tested in a pairwise manner, following the dis- 
criminant analysis technique described above, 
as well as on the full multiclass problem of 
assigning subclass labels to the full dataset 
based on electrophysiology. For this problem a 
multiclass logistic regression classifier was 
used, with balanced class weights. To assess 
within-dataset performance, repeated strati- 
fied fivefold cross-validation was used, with 
classifier predictions on test data aggregated 
across cross-validation folds to calculate a con- 
fusion matrix of performance. Performance 
was additionally assessed on a held-out second- 
ary electrophysiology dataset, as described in 
the main text. To prevent features with dataset 
dependence from degrading performance, af- 
fected features were excluded based on a one-way 
ANOVA for the effects of dataset (primary or two 
secondary datasets collected at different sites). 
Features with P < 0.05 or R? > 0.05 were excluded. 

To demonstrate and visualize discrimination 
based on small subsets of electrophysiology 
features, we searched for 1D- and 2D- feature 
subspaces in which each subclass clustered 
separately from all other cells. A two-cluster 
Gaussian mixture model was fit to the data in 
each subspace, and performance was assessed 
by fl score (harmonic mean of precision and recall) 
after identifying the cluster that best matched 
the subset of interest. Results were shown for 
the highest-ranked subspace for each subclass. 


MERFISH data collection 


Human postmortem frozen brain tissue was 
embedded in Optimum Cutting Temperature 
medium (VWR,25608-930) and sectioned on 
a Leica cryostat at -17°C at 10 um onto Vizgen 
MERSCOPE coverslips. These sections were 
then processed for MERSCOPE imaging ac- 
cording to the manufacturer’s instructions. 
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Briefly, sections were allowed to adhere to these 
coverslips at room temperature for 10 min prior 
to a 1-min wash in nuclease-free phosphate 
buffered saline (PBS) and fixation for 15 min 
in 4% paraformaldehyde in PBS. Fixation was 
followed by 3x5 min washes in PBS prior to a 
1-min wash in 70% ethanol. Fixed sections 
were then stored in 70% ethanol at 4°C prior 
to use and for up to 1 month. Human sec- 
tions were photobleached using a 150-W LED 
array for 72 hours at 4°C prior to hybridiza- 
tion then washed in 5 ml Sample Prep Wash 
Buffer (VIZGEN 20300001) in a 5-cm petri dish. 
Sections were then incubated in 5 ml Forma- 
mide Wash Buffer (VIZGEN 20300002) at 37°C 
for 30 min. Sections were hybridized by placing 
50 ul of VIZGEN-supplied Gene Panel Mix onto 
the section, covering with parafilm, and incu- 
bating at 37°C for 36 to 48 hours in a humidi- 
fied hybridization oven. Following hybridization, 
sections were washed twice in 5 ml Formamide 
Wash Buffer for 30 min at 47°C. Sections were 
then embedded in acrylamide by polymeriz- 
ing VIZGEN Embedding Premix (VIZGEN 
20300004) according to the manufacturer’s 
instructions. Sections were embedded by in- 
verting sections onto 110 ul of Embedding 
Premix and 10% Ammonium Persulfate (Sigma 
A3678) and TEMED (BioRad 161-0800) solution 
applied to a Gel Slick (Lonza 50640) treated 
2x3 glass slide. The coverslips were pressed 
gently onto the acrylamide solution and allowed 
to polymerize for 1.5 hours. Following embedd- 
ing, sections were cleared for 24 to 48 hours 
with a mixture of VIZGEN Clearing Solution 
(VIZGEN 20300003) and Proteinase K (New 
England Biolabs P8107S) according to the 
manufacturer's instructions. Following clearing, 
sections were washed twice for 5 min in 
Sample Prep Wash Buffer (PN 20300001). 
VIZGEN DAPI and PolyT Stain (PN 20300021) 
were applied to each section for 15 min fol- 
lowed by a 10-min wash in Formamide Wash 
Buffer. Formamide Wash Buffer was removed 
and replaced with Sample Prep Wash Buffer 
during MERSCOPE setup. 100 ul of RNAse 
Inhibitor (New England BioLabs M0314L) 
was added to 250 ul of Imaging Buffer Acti- 
vator (PN 203000015), and this mixture was 
added via the cartridge activation port to a 
prethawed and mixed MERSCOPE Imaging 
cartridge (VIZGEN PN1040004). 15 ml mine- 
ral oil (Millipore-Sigma m5904-6X500ML) was 
added to the activation port and the MER- 
SCOPE fluidics system was primed accord- 
ing to VIZGEN instructions. The flow chamber 
was assembled with the hybridized and 
cleared section coverslip according to VIZGEN 
specifications, and the imaging session was 
initiated after collection of a 1OX mosaic DAPI 
image and selection of the imaging area. For 
specimens that passed minimum count thresh- 
old, imaging was initiated and processing 
completed according to VIZGEN proprietary 
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protocol. Following image processing and seg- 
mentation, cells with fewer than 50 transcripts 
were eliminated, as well as cells with volumes 
falling outside a range of 100 to 300 um. 


Gene panel selection 


The 140-gene human cortical panel was selected 
using a combination of manual and algorith- 
mic based strategies requiring a reference single- 
cell/nucleus RNA-seq dataset from the same 
tissue, in this case the human MTG snRNA-seq 
dataset and resulting taxonomy (27). First, the 
reference RNA-seq dataset is filtered to only 
include genes compatible with mFISH. Retained 
genes need to be (i) long enough to allow probe 
design (> 960 base pairs); (ii) expressed highly 
enough to be detected (FPKM >= 10), but not 
so high as to overcrowd the signal of other genes 
in a cell (FPKM < 500); (iii) expressed with low 
expression in off-target cells (FPKM < 50 in 
nonneuronal cells); and (iv) differentially ex- 
pressed between cell types [top 500 remaining 
genes by marker score (94)]. To more evenly 
sample each cell type, the reference dataset is 
also filtered to include a maximum of 50 cells per 
cluster. Second, an initial set of high-confidence 
marker genes are selected through a combi- 
nation of literature search and analysis of the 
reference data. 

The main step of gene selection uses a greedy 
algorithm to iteratively add genes to the initial 
set. To do this, each cell in the filtered reference 
dataset is mapped to a cell type by taking the 
Pearson correlation of its expression levels with 
each cluster median using the initial gene set of 
size n, and the cluster corresponding to the 
maximum value is defined as the “mapped 
cluster.” The “mapping distance” is then defined 
as the average cluster distance between the 
mapped cluster and the originally assigned 
cluster for each cell. In this case, a weighted 
cluster distance, defined as one minus the 
Pearson correlation between cluster medians 
calculated across all filtered genes, is used to 
penalize cases where cells are mapped to very 
different types; but an unweighted distance, 
defined as the fraction of cells that do not map 
to their assigned cluster, could also be used. 
This mapping step is repeated for every pos- 
sible n+1 gene set in the filtered reference 
dataset, and the set with minimum cluster 
distance is retained as the new gene set. These 
steps are repeated with the new gene set (of 
size n+1) until a gene panel of the desired size 
is attained. Code for reproducing this gene 
selection strategy is available as part of the 
mfishtools R library (https://github.com/ 
AllenInstitute/mfishtools). 


Mapping transcriptomic types and 
calculating proportions 


Any genes not matched across both the 
MERSCOPE gene panel and the mapping taxo- 
nomy were filtered from the dataset before 
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starting. From there, cluster means were cal- 
culated by dividing the number of cells per 
cluster by the number of clusters collected. 
Next, we created a training dataset by finding 
marker genes for each cluster by calculating 
the Euclidean distance between all clusters 
and the mean counts of each gene per cluster. 
This training dataset was fed into a KNN clas- 
sifier alongside the MERSCOPEs cell by gene 
panel to iteratively calculate the best possible 
gene matches per cluster. Proportions of L1 
t-types present in Ll were calculated in sam- 
ples from five donors by manually drawing L1 
borders and selecting the corresponding cells. 
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INTRODUCTION: The cerebral cortex is involved 
in complex cognitive functions such as lan- 
guage. Although the diversity and organiza- 
tion of cortical cell types has been extensively 
studied in several mammalian species, human 
cortical specializations that may underlie our 
distinctive cognitive abilities remain poorly 
understood. 


RATIONALE: Single-nucleus RNA sequencing 


(snRNA-seq) offers a relatively unbiased char- 
acterization of cellular diversity of brain regions. 


A MTG consensus cell types 


Comparative transcriptomic analysis enables 
the identification of molecular and cellular 
features that are conserved and specialized 
but is often limited by the number of species 
analyzed. We applied deep transcriptomic 
profiling of the cerebral cortex of humans 
and four nonhuman primate (NHP) species 
to identify homologous cell types and human 
specializations. 


RESULTS: We generated snRNA-seq data from 
humans, chimpanzees, gorillas, rhesus macaques, 
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Divergent gene expression in the primate neocortex. (A) Proportions of neuronal subclasses are conserved 
across species, except for increased proportions of three subclasses (asterisks) in marmosets. Among great 
apes, neuronal gene expression has evolved faster on the human lineage, and glial expression has diverged 
faster than neuronal expression in all species. (B) Many human-specific DEGs are associated with circuit 
function and are linked to potentially adaptive changes in gene regulation. 
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tal) to build a cellular classification of a langu—- 
associated region of the cortex, the middle 
temporal gyrus (MTG), in each species and a 
consensus primate taxonomy. Cell-type propor- 
tions and distributions across cortical layers 
are highly conserved among great apes, whereas 
marmosets have higher proportions of L5/6 IT 
CAR3 and L5 ET excitatory neurons and Chan- 
delier inhibitory neurons. This strongly points to 
the possibility that other cellular features drive 
human-specific cortical evolution. Profiling go- 
rillas enabled discrimination of which human 
and chimpanzee expression differences are spe- 
cialized in humans. We discovered that chim- 
panzee neurons have gene expression profiles 
that are more similar to those of gorilla neurons 
than to those of human neurons, despite chim- 
panzees and humans sharing a more-recent 
common ancestor. By contrast, glial expression 
changes were consistent with evolutionary dis- 
tances and were more rapid than neuronal ex- 
pression changes in all species. Thus, our data 
support a faster divergence of neuronal, but not 
glial, expression on the human lineage. For all 
primate species, many differentially expressed 
genes (DEGs) were specific to one or a few cell 
types and were significantly enriched in molec- 
ular pathways related to synaptic connectivity 
and signaling. Hundreds of genes had human- 
specific differences in transcript isoform 
usage, and these genes were largely distinct 
from DEGs. We leveraged published datasets 
to link human-specific DEGs to regions of 
the genome with human-accelerated muta- 
tions or deletions (HARs and hCONDELSs). 
This led to the surprising discovery that a 
large fraction of human-specific DEGs (15 to 
40%), and particularly those associated with 
synaptic connections and signaling, were near 
these genomic regions that are under adaptive 
selection. 


CONCLUSION: Our study found that MTG cell 
types are largely conserved across approxi- 
mately 40 million years of primate evolution, 
and the composition and spatial positioning of 
cell types are shared among great apes. In each 
species, hundreds of genes exhibit cell type- 
specific expression changes, particularly in 
pathways related to neuronal and glial com- 
munication. Human-specific DEGs are enriched 
near likely adaptive genomic changes and are 
poised to contribute to human-specialized cor- 
tical function. 
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The cognitive abilities of humans are distinctive among primates, but their molecular and cellular 
substrates are poorly understood. We used comparative single-nucleus transcriptomics to analyze 
samples of the middle temporal gyrus (MTG) from adult humans, chimpanzees, gorillas, rhesus 
macaques, and common marmosets to understand human-specific features of the neocortex. Human, 
chimpanzee, and gorilla MTG showed highly similar cell-type composition and laminar organization as 
well as a large shift in proportions of deep-layer intratelencephalic-projecting neurons compared with 
macaque and marmoset MTG. Microglia, astrocytes, and oligodendrocytes had more-divergent 
expression across species compared with neurons or oligodendrocyte precursor cells, and neuronal 
expression diverged more rapidly on the human lineage. Only a few hundred genes showed human- 
specific patterning, suggesting that relatively few cellular and molecular changes distinctively define 


adult human cortical structure. 


umans have distinctive cognitive abili- 

ties compared with nonhuman primates 

(NHPs), including chimpanzees, which 

are our closest evolutionary cousins. For 

example, humans have a capacity for 
vocal learning that requires a highly inter- 
connected set of brain regions, including the 
middle temporal gyrus (MTG) region of the 
neocortex, which integrates multimodal sen- 
sory information and is critical for visual and 
auditory language comprehension (/, 2). Human 
MTG is larger and more connected to other 
language-associated cortical areas than the MTG 
of chimpanzees and other NHPs (3-5). These 
gross anatomical differences may be accom- 
panied by changes in the molecular programs 
of cortical neurons and non-neuronal cells. 
Indeed, previous work has identified hundreds 
of genes with up- or down-regulated expression 
in the cortex of humans compared with that 
of chimpanzees and other primates (6-9) but 
have been limited to comparing broad popu- 
lations of cells or have lacked another great 
ape species in which to study changes specific 
to the human lineage. 

Single-nucleus RNA sequencing (snRNA-seq) 
has enabled the generation of high-resolution 
transcriptomic taxonomies of cell types in the 
neocortex and other brain regions. Compara- 
tive analysis has established homologous cell 
types across mammals, including humans and 
NHPs, and identified conserved and specialized 
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features: cellular proportions (JO), spatial dis- 
tributions (77), and transcriptomic and epigeno- 
mic profiles (72). In this study, we profiled more 
than 570,000 single nuclei using RNA sequenc- 
ing from the MTG of five species: human, two 
other great apes (chimpanzee and gorilla), a 
cercopithecid monkey (rhesus macaque), and a 
platyrrhine monkey (common marmoset). On the 
basis of a recently published mammalian phy- 
logeny (13), this represents approximately 
38 million years of evolution since these pri- 
mate species shared a last common ancestor 
and encompasses the relatively recent diver- 
gence of the human lineage from that of chim- 
panzees at 6 million years ago. 

We defined cell-type taxonomies for each 
species and a consensus taxonomy of 57 homo- 
logous cell types that were conserved across 
these primates. This enabled a comparison 
of the cellular architecture of the cortex in 
humans with that of a representative sample 
of non-human primates at high resolution to 
disentangle evolutionary changes in cellular 
composition from gene expression profiles. 
Including gorillas as a third great apes spe- 
cies enabled us to infer which differences be- 
tween humans and chimpanzees are newly 
evolved in humans. Including two phylogeneti- 
cally diverse monkey species enabled us to iden- 
tify the cellular specializations that humans 
share with other great apes that may contrib- 
ute to our enhanced cognitive abilities. Finally, 
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we identified a subset of changes that may 
be adaptive by establishing putative links be- 
tween human accelerated regions (HARs) and 
human conserved deletions (hCONDELs) and 
human expression specializations by leverag- 
ing recently generated datasets of the in vitro 
activity of HARs (/4) and cell type-specific 
chromatin folding (15, 16). 


Within-species cell-type taxonomies 


MTG cortical samples were collected from 
postmortem adult male and female human, 
chimpanzee (Pan troglodytes), gorilla (Gorilla 
gorilla), rhesus macaque (Macaca mulatta), 
and common marmoset (Callithrix jacchus) 
individuals for snRNA-seq (Fig. 1B). MTG was 
identified in each species using gross anatomi- 
cal landmarks. Layer dissections for human, 
chimpanzee, and gorilla datasets were iden- 
tified and sampled as previously described 


(17). MTG slabs were sectioned and stained ° 


with fluorescent Nissl, and layers were micro- 
dissected and processed separately for nuclear 
isolation. 

For humans, single nuclei from seven indi- 
viduals contributed to three RNA-seq datasets: 


a Chromium 10x v3 (Cv3) dataset sampled from , 


all six cortical layers (n = 107,000 nuclei), a Cv3 
dataset sampled from microdissected layer 5 
to capture rare excitatory neuron types (n = 


36,000), and our previously characterized (17) . 


SMARTseq v4 (SSv4) dataset of six micro- 
dissected layers (n = 14,500). Chimpanzee 
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Fig. 1. Transcriptomic cell-type taxonomies of human and NHP MTG. 

(A) Representative Nissl-stained cross sections of MTG in the five species 
profiled. The inset shows the approximate MTG region dissected from human 
brain. (B) Phylogeny of species (left; MYA, millions of years ago) and bar plots of 
nuclei that passed quality control (center) and sampled individuals (right) for 
each dataset. (C) Uniform manifold approximation and projection (UMAP) plots 
of single nuclei from human MTG integrated across individuals and RNA-seq 
technologies and colored by cluster, individual ID, and dissected layer. (D) From 
top to bottom are the following: a human taxonomy dendrogram based on Cv3 
cluster median expression; a heatmap of laminar distributions estimated from 


(n = 7 individuals) datasets included Cv3 
across layers (m = 109,000 nuclei) and SSv4 
layer dissections (n = 3900), and gorilla (n = 4) 
datasets included Cv3 (nm = 136,000) and SSv4 
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(n = 44.00). Macaque (7 = 3) and marmoset (7 = 3) 
datasets included Cv3 from all layers (n = 89,700 
and 76,900 nuclei, respectively). All nuclei prep- 
arations were stained for the pan-neuronal 


Inhibitory neurons Non-neuronal 


SSv4 layer dissections; violin plots of the relative cortical depth (pia to white 
matter) of cells grouped by type based on in situ measurement of marker 
expression in human MTG; a dot plot of cell-type abundance represented as a 
proportion of class (excitatory, inhibitory, glia), where error bars denote 
standard deviation across Cv3 individuals (L5 is the only dissection excluded); 
bar plots indicating the proportion of each cluster that is composed of Cv3 all 
layers, Cv3 layer 5 only, and SSv4 layer dissected datasets; bar plots 
indicating the proportion of each cluster that is composed of each individual: 
and violin plots showing the number of distinct genes detected from Cv3 
datasets. 


marker NeuN and purified by fluorescence- 
activated cell sorting (FACS) to enrich for neurons 
over non-neuronal cells. Samples contain- 
ing 90% NeuN’ (neurons) and 10% NeuN” 
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(non-neuronal cells) nuclei were used for li- 
brary preparations and sequencing. Nuclei 
from Cv3 experiments were sequenced to a 
saturation target of 60%, resulting in approx- 
imately 120,000 reads per nucleus. Nuclei 
from SSv4 experiments were sequenced to a 
target of 500,000 reads per nucleus. 

Each species was independently analyzed to 
generate a “within-species” taxonomy of cell 
types. First, datasets were annotated with cell 
subclass labels from our published human MTG 
and primary motor (M1) taxonomies (12, 17) 
using Seurat (J8). Cell types were grouped into 
five neighborhoods—intratelencephalic (IT)- 
projecting and non-IT-projecting excitatory 
neurons, caudal ganglionic eminence (CGE)- 
and medial ganglionic eminence (MGE)-derived 
interneurons, and non-neuronal cells—that were 
analyzed separately. High-quality nuclei were 
normalized using SCTransform (19) and inte- 
grated across individuals and data modalities 
using canonical correlation analysis. Human 
nuclei were well mixed across the three datasets 
and across individuals (Fig. 1C), and similar 
mixing was observed for the other species (figs. 
Sl and S2). The integrated space was clustered 
into small “metacells” that were merged into 
151 clusters (Fig. 1D and fig. S3) that included 
nuclei from all datasets and individuals. Cell 
types had robust gene detection (neuronal, 
median 3000 to 9000 genes; non-neuronal, me- 
dian 1500 to 3000 genes) and were often rare 
(less than 1 to 2% of the cell class) and re- 
stricted to one or two layers (table S1). Single 
nuclei from the other four species were clustered 
using identical parameters, resulting in 109 clus- 
ters in chimpanzees (fig. S1A), 116 in gorillas 
(fig. SIB), 120 in macaques (fig. S2A), and 104 in 
marmosets (fig. S2B). Humans had the most 
cell-type diversity (151 clusters), although the 
number of cell types could have been driven 
by technical factors: sampled individuals (only 
female macaques), tissue dissections (additional 
layer 5 sampling for humans), RNA-seq method 
(SSv4 included for great apes), and genome an- 
notation quality. 

Species cell types were hierarchically orga- 
nized into dendrograms based on transcrip- 
tomic similarity (Fig. 1D and figs. S1 to S3) and 
grouped into three major cell classes: excitatory 
(glutamatergic) neurons, inhibitory [y-amino- 
butyric acid-releasing (GABAergic)] neurons, 
and non-neuronal cells. Each of the three major 
classes were further divided into cell neighbor- 
hoods and subclasses based on an integrated 
analysis of marker gene expression, layer dis- 
sections, and comparison with published cor- 
tical cell types (72). In total, we identified 24 
conserved subclasses (18 neuronal, 6 non- 
neuronal) (fig. S4A) that were used as a prefix 
for cell-type labels. Inhibitory neurons com- 
prised five CGE-derived subclasses (LAMP5 
LHX6, LAMP5, VIP, PAX6, and SNCG) expressing 
the marker ADARB2 and four MGE-derived sub- 
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classes (Chandelier, PVALB, SST, and SST CHODL) 
expressing LHX6. Excitatory neurons include 
five IT-projecting subclasses (L2/3 IT, LA IT, L5 
IT, L6IT, and L5/6 IT CAR3) and four deep-layer 
non-IT-projecting subclasses (L5 ET, L5/6 NP, 
L6b, and L6 CT). Non-neuronal cells were 
grouped into six subclasses: astrocytes, oligo- 
dendrocyte precursor cells (OPCs), oligoden- 
drocytes, microglia and perivascular macrophages 
(micro/PVMs), endothelial cells, and vascular 
and leptomeningeal cells (VLMCs). 

This human MTG taxonomy provided sub- 
stantially higher cell-type resolution than our 
previously published human cortical taxono- 
mies (72, 17), likely because of increased sam- 
pling (155,000 versus 15,000 to 85,000 nuclei; 
fig. S3). Furthermore, the in situ spatial dis- 
tributions of cell types were characterized 
using MERFISH and are included as a gallery 
of human MTG sections (data S1) and sum- 
marized by cortical depth (Fig. 1D). All cell types 
matched one to one or one to many, and diver- 
sity was particularly expanded for non-neuronal 
subclasses and several neuronal subclasses and 
types: L5/6 NP (six types), L6 CT (four types), 
L2/3 IT FREM3 (eight types), and SST CALBI 
(nine types). The FREM3 subtypes had a graded 
distribution across layers 2 and 3, consistent 
with spatial variation in FREM3 neuron mor- 
phology and electrophysiology (20). 


Divergent abundances of cell types 


Neuronal subclass frequencies were estimated 
as a proportion of excitatory and inhibitory 
neuron classes based on snRNA-seq sampling 
to account for species differences in the ratio 
of excitatory to inhibitory neurons (E:I ratio) 
(fig. S4B) (72). Subclass proportions were highly 
consistent across individuals within each spe- 
cies and varied significantly [one-way analysis 
of variance (ANOVA), P < 0.05] across spe- 
cies (Fig. 2A). Post hoc pairwise ¢ tests be- 
tween humans and each NHP identified up to 
fivefold more L5/6 IT CAR3, L5 ET, and PVALB- 
expressing chandelier interneurons in marmo- 
sets. Interestingly, L2/3 IT neurons had similar 
proportions in the MTG, in contrast to the 50% 
expansion of L2/3 IT neurons in humans versus 
marmosets in M1 (72). 

Among L5/6 IT CAR3 neurons, two distinct 
subtypes had high and low CUX2 expression, 
respectively, in all species (Fig. 2B). HTR2C 
and MGAT4C were additional conserved markers 
of the high-CUX2 subtype, and BCLIIA and 
LDB2 were markers of the low-CUX2 subtype 
(Fig. 2C). Subtype proportions were balanced 
in great apes, mostly low-CUX2 in macaques, 
and mostly high-CUX2 in marmosets (Fig. 2D). 
Low-CUX2 neurons were consistently more 
enriched in deeper layers than high-CUX2 neu- 
rons in all three great apes (Fig. 2E). In human 
and macaque MTG, in situ labeling of marker 
genes using MERFISH (Fig. 2F) validated that 
the low-CUX2 subtype was enriched at the 
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border of layers 5 and 6, and the high-CUX2 
subtype extended from upper layer 6 through 
layer 5. In macaque MTG, the proportion of 
high-CUX2 neurons varied along the gyrus 
(Fig. 2F) with little on the ventral side, con- 
sistent with the snRNA-seq data, and more on 
the dorsal side. In marmosets, in situ labeling 
of marker genes using RNAscope showed that 
high-CUX2 neurons were enriched in MTG 
(TPO and TE3), consistent with snRNA-seq data, 
and in adjacent secondary auditory regions 
(Fig. 2F). On the basis of the snRNA-seq data 
that we collected from seven additional re- 
gions of the human cortex (21), low-CUX2 
neurons were more common in many regions, 
and high-CUX2 neurons were enriched in tem- 
poral cortex (MTG and primary auditory, A1) 
and parietal cortex (angular gyrus, ANG and 
primary somatosensory, S1) (Fig. 2G). Similarly, 
snRNA-seq data collected from six additional 
regions of the marmoset cortex (10, 12, 22) 
revealed that the high-CUX2 subtype was most 
enriched in temporal areas (MTG and A1) and 
less enriched in S1 (Fig. 2G). 


Primate specializations of cell-type expression 


Next, we compared the transcriptomic simi- 
larity of subclasses across primates. For each 
species, we defined gene markers that could 
reliably predict the subclass identities of cells 
and were filtered to include one-to-one ortho- 
logs (table S2). Non-neuronal subclasses ex- 
pressed hundreds of markers and demonstrated 
greater distinction than neuronal subclasses that 
had 50 to 100 markers. Each subclass had a 
similar number of markers in all species (Fig. 
3A), but only 10 to 20% had strongly conserved 
specificity (Fig. 3, A and B). To compare the 
global expression profile of subclasses across 
primates, we correlated normalized median 
expression of variable genes between each spe- 
cies pair for each cell subclass (excluding under- 
sampled endothelial cells and VLMCs) (Fig. 3C). 
Glial cells (except OPCs) had greater expression 
changes between species compared with neu- 
rons. Expression similarity decreased with evo- 
lutionary distance between human and NHPs 
at a similar rate across neuronal subclasses and 
OPCs and faster in oligodendrocytes, astrocytes, 
and microglia in particular (Fig. 3D). Glial ex- 
pression remained more divergent across spe- 
cies after normalizing for increased variation 
within species (fig. S4C). Chimpanzee neuro- 
nal subclasses were more similar to those of 
gorillas than to those of humans (Fig. 3E), 
despite a more recent common ancestor with 
humans (6 million versus 7 million years ago). 
This was consistent with the faster evolution 
of neurons on the human lineage since the 
divergence with chimpanzees. By contrast, 
there was no evidence for faster divergence 
among non-neuronal cells on the human line- 
age (Fig. 3E) or on the lineage leading to great 
apes (fig. S4D). 
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In addition to evolutionary changes in gene 
transcript amounts, there may be changes in 
transcript isoform usage. We quantified iso- 
form expression using full-length transcript 
information from SSv4 RNA-seq data acquired 
from great apes. For each cell subclass, we 
identified differentially expressed genes (DEGs) 
(table S3) and genes with at least moderately 
high expression that strongly switched isoform 
usage between each pair of species (table S4). 
There was little overlap between genes with 
differential expression and isoform usage for 
L2/3 IT neurons (fig. S4E). Genes with a human- 
specialized switch in isoform expression in- 
cluded BCARI, INO80B, and SBNOI (fig. S4F). 
BCARI is a scaffold protein that is a compo- 
nent of the netrin signaling pathway and is 
involved in axon guidance (2.3). INO80B (24) and 
SBNO1 are involved in chromatin remodeling, 
and SBNOI contributes to brain-axis develop- 
ment in zebrafish (25) and is a risk gene for 
intellectual disability (26). The predominant 
isoform of JVO80B in human L2/3 IT neurons 
includes a retained intron (fig. S4G) that may 
suppress transcription of this gene (27) and 
contribute to human specializations. 

Finally, we quantified the conservation of 
gene expression patterns across cell types 
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between humans and NHPs. As expected, ex- 
pression differences increased with evolution- 
ary distance (fig. S4H), and 75% of genes were 
conserved in all species [Spearman correlation 
(r) > 0.9 in great apes; 7 > 0.65 in marmosets]. 
Highly divergent expression (7 < 0.25) was 
observed for 651 genes and often in only a 
single species (fig. S41), such as KAM1777B, which 
was exclusively expressed in human microglia 
(fig. S4J). A few genes had fixed derived ex- 
pression in the great ape lineage. For instance, 
MEPE encodes a secreted calcium-binding phos- 
phoprotein, and its expression was restricted 
to PVALB-expressing interneurons in great apes 
(fig. S4,J); prolactin receptor (PRLR) had enriched 
expression in SS7-expressing interneurons and 
L5/6 IT CAR3 neurons in great apes compared 
with CGE-derived interneurons in macaques 
and marmosets, which potentially alters hormo- 
nal modulation of these neurons. 


Human specializations of glial cells 


Because glial cells exhibited the most diver- 
gent gene expression changes across species 
(Fig. 3, C and D), we next aimed to uncover 
their specialized transcriptional programs in 
humans versus other great apes. For astro- 
cytes, we found more human DEGs (1189) 


E 


than chimpanzee (787) or gorilla (617) DEGs 
(Fig. 4, A and B; fig. S5A; and table S3) and 
three times more highly divergent (>10-fold) 
human DEGs (Fig. 4A). Human astrocyte DEGs 
were enriched in synaptic signaling and protein 
translation pathways on the basis of enrich- 
ment analyses using Gene Ontology (GO) (Fig. 
4C) and Synaptic GO (SynGO) (28) (Fig. 4D and 
fig. S5B) databases. To study synapse-related 
astrocytic gene programs, we used a molecular 
database of astrocyte cell-surface molecules 
enriched at astrocyte-neuron junctions from 
an in vivo proteomic labeling approach in 
the mouse cortex (29). Among genes encoding 
118 proteins that were robustly enriched in 
perisynaptic astrocytic processes, 24 genes (20%) 
were differentially expressed in human astro- 
cytes compared with chimpanzee and gorilla 
astrocytes (Fig. 4E), and 47 genes (40%) had 
conserved expression across great apes (fig. S5, 
C and D). Neuroligins and neurexins, which 
are ligand-receptor pairs that play a key role 
in astrocytic morphology and synaptic devel- 
opment (30), showed divergent expression 
patterns across great ape species (Fig. 4, F and 
G). Other cell-adhesion gene families with well- 
known functions in astrocytic morphological 
and synaptic development also had multiple 
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Fig. 4. Human cortical astrocytes have specialized molecular features. 

(A) Upset plot showing the number of DEGs in cortical astrocytes for pairwise 
comparisons between great ape species. The inset shows the number of highly 
divergent genes (fold change >10). (B) Heatmap showing row-scaled expression 
of human DEGs versus chimpanzee and gorilla DEGs. (C and D) Significantly 
enriched GO (C) and SynGO (D) terms in the union of astrocyte DEGs from the 
pairwise comparison between human and chimpanzee and the pairwise 
comparison between human and gorilla (FDR <0.05). (E) Heatmap showing 
human DEGs (FDR <0.01, normalized gene count >5) of the proteome of 
perisynaptic astrocytic processes (29). (F) Schematic illustrating the trans- 
cellular interaction of astrocytic neuroligins and neuronal neurexins that is known 
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to play a role in astrocytic morphology and synaptic development. (G) Box 
plots showing differential gene expression of neuroligins and neurexins in 
astrocytes across primate species. (H) Schematic illustrating ligand-receptor 
interactions of the neuregulin-ErbB signaling pathway. (I) Box plots showing 
differential expression of the ligand-encoding genes NRG2 and NRG3 and the 
receptor-encoding genes EGFR and ERBB4 in astrocytes across primate species. 
(J) Gene expression patterns of ERBB4 across astrocyte subtypes and great 
ape species. The dashed lines indicate the labeled astrocyte subtypes. ILM, 
interlaminar. For box plots in (G) and (I), the center line represents the median, 
box limits are upper and lower quartiles, and whiskers are minimum and 
maximum values. 
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members among human astrocyte DEGs, in- 
cluding ephrins and their cognate receptors 
(EFNA5S, EPHA6), clustered protocadherins 
(PCDH9), and teneurins (TENM2, TENM3, 
TENM4) (fig. S5, D and F). 

In addition to cell-adhesion programs, we 
explored other cell-surface or secreted ligands 
and receptors that contribute to astrocyte func- 
tion. We found that several astrocyte-secreted 
synaptogenic molecules, such as osteonectin 
(SPARC) and hevin (SPARCLJ), and extracellular 
matrix (ECM)-related proteins (brevican, BCAN; 
neurocan, NCAN; and phosphacan, PTPRZI) 
were up-regulated in human astrocytes (fig. 
S5, E and F). Of note, four members of the 
neuregulin-ErbB signaling pathway showed 
differential gene expression in great ape astro- 
cytes, with two receptors (EGFR and ERBB4) 
displaying expression changes in opposite direc- 
tions (Fig. 5, I and J). Up-regulation of human 
ERBB4 expression was higher in protoplasmic 
and fibrous astrocytes than in interlaminar 
astrocytes (Fig. 3J and fig. S5G), demonstrating 
that transcriptional specializations can occur 
in a subtype-specific fashion. Finally, glutamate 
AMPA receptor subunits (GRIAI, GRIA2, GRIA4) 
had more than threefold greater expression in 
human astrocytes compared with chimpanzee 
and gorilla astrocytes, suggesting a human- 
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specific astrocyte responsiveness to glutamate 
(fig. S5H). 

We next examined gene expression changes 
in microglia, which also play critical roles in 
cortical circuit formation (31, 32). Recent com- 
parative spatial transcriptomic data indicate 
that microglia-neuron contacts are more prev- 
alent in human cortical circuits compared 
with those of mice, particularly in superficial 
layers (33). We reasoned that evolutionary 
changes in microglial connectivity could be 
mediated by fine-tuning the expression of cell- 
surface ligands and receptors. Indeed, we 
found that human microglia have more DEGs 
(328) than chimpanzee (175) or gorilla (164) 
microglia (fig. S6, A to C), and human DEGs 
were overrepresented in GO and SynGO terms 
related to synaptic compartments (fig. S6, 
D to F). Among the human microglia DEGs 
were several disease-associated genes, includ- 
ing SNCA (which encodes o-synuclein) and 
TMEMI163, which are implicated in neurode- 
generative disorders (34-36), and Kalirin 
(KALRN), which is associated with neurodeve- 
lopmental and neuropsychiatric disorders (37) 
(fig. S6G). We also corroborated the human- 
specific up-regulation of FOXP2 and CACNAID, 
of which the latter was recently reported in the 
dorsolateral prefrontal cortex (9). 
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Oligodendrocytes also showed human spe- 
cializations, including DEGs involved in myelin 
organization and cell adhesion (for example, 
CNTNAP2 and LAMA2) (fig. S6). Unlike astro- 
cytes and microglia, human and chimpanzee 
oligodendrocytes had similar numbers of DEGs, 
although humans had more up-regulated, highly 
divergent DEGs (fig. S61). These findings support 
faster divergence of glial expression in the hu- 
man lineage that parallels neuronal divergence 
and likely affects interactions between glia and 
neurons. 


Consensus cell-type conservation and divergence 


To further investigate the canonical architec- 
ture of primate MTG, we built a transcripto- 
mic taxonomy of high-resolution consensus cell 
types. Starting with CGE-derived interneurons, 
we integrated single-nucleus expression profiles 
across the five species based on conserved co- 
expression using Seurat (18). Within-species 
cell types remained distinct, and nuclei were 
well integrated (Fig. 5, A and B, and fig. S7), par- 
ticularly for humans and chimpanzees (fig. 
S12A). Similar results were observed for the 
other cell neighborhoods (figs. S8 to S11). Sep- 
arate pairwise alignments between humans 
and NHPs confirmed that cell-type homologies 
were better resolved in more closely related 
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Fig. 5. Divergent expression across conserved cell types. (A) UMAPs of 
CGE-derived interneuron expression generated for each species and colored by 
within-species cell types. (B) UMAPs of CGE interneuron expression integrated 
across the five species and with the same coloring as (A). (C) Consensus 
taxonomy of 57 homologous cell types identified in all five species (asterisks 
indicate a one-to-one match across all species). For great ape species, heatmaps 
show the proportion of nuclei dissected from layers 1 through 6 for each type. 
The dot plot shows the number of within-species clusters that are associated with 
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each consensus type. The line plot shows the number of hDEGs per consensus type 
with fold change >1.4 for each species (colors as in dot plot). The bar plot shows 
the average classification accuracy (F1 score) across the five species using scPoli 
(84) (fig. $12). (D) Summary of GO enrichment analysis of species DEGs. Cellular 
component terms that were significantly enriched for hDEGs in at least one 
consensus type and that form four distinct groups of similar GO terms are shown. 
(E) Summary of the number of consensus types that express hDEGs that 

were enriched for at least one term in the four semantic GO groups. 
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species (fig. S12B). We also found that exci- 
tatory neurons were less well integrated than 
inhibitory neurons, and this finding was con- 
sistent with greater species specializations of 
excitatory types. 

We established homologous cell types be- 
tween all pairs of species using MetaNeighbor, 
a Statistical framework (38, 39) that identifies 
cell types that can be reliably discriminated [area 
under the receiver operating characteristic curve 
(AUROC) >0.6] from nearest neighbors in one 
species based on training data from the other 
species or that are reciprocal best matches. 
Pairwise cell-type homologies were integrated 
to define 57 consensus types that included cell 
types that were identified in the five species, 
and a dendrogram was constructed based on 
transcriptomic similarities (Fig. 5C). The robust- 
ness of the 57 homologous types across species 
was confirmed using a complementary approach 
to consensus clustering, scArches (40) (fig. S12, 
C to H). Classification accuracy varied across 
consensus types (Fig. 5C) and with nearly 
perfect classification performance across spe- 
cies (average F1 score >0.95) for distinct inter- 
neuron types (LAMP5 LHX6, PAX6_1, and 
Chandelier cells) and non-neuronal types 
(astrocytes, oligodendrocytes, and endothelial 
cells). The rare OPC_1 subtype (5% of OPCs) 
had the lowest classification accuracy and some- 
what ambiguous homology across species (fig. 
S11) and may represent different subpopula- 
tions of OPCs across species. Eight consensus 
types represented one-to-one matches across 
all species, and most of the types represented 
multiple matches of between 2 and 10 within- 
species types. Differential sampling of nuclei 
across species owing to differences in dissec- 
tions or cell-type proportions might have con- 
tributed to the number of cell types that 
mapped to a consensus type. For example, in 
human MTG, more nuclei were sampled from 
layer 5 and more subtypes of the layer 5- 
enriched SST_3 consensus type were identi- 
fied. Thus, there was a conserved set of cell 
types in primate MTG with transcriptomic 
specializations of subtypes. Laminar distri- 
butions of types were notably conserved across 
the great apes, except for SST CHODL_1, 
which was present in more superficial layers of 
gorilla MTG (Fig. 5C), although more sampling 
of this rare type is needed for validation. 

Previous work reported the lack of tran- 
script and protein expression of tyrosine hydrox- 
ylase (7H), which encodes a key enzyme in the 
dopamine synthesis pathway, in the neocortex 
of non-human African great apes, including 
chimpanzees and gorillas (41, 42). Recent tran- 
scriptomic profiling of chimpanzee prefrontal 
cortex suggests that this represents a loss of 
dopamine signaling in a conserved cell type 
rather than a loss of a homologous type (9). 
In MTG, we identified nine consensus SST- 
expressing interneuron types present in all five 
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primates (Fig. 5E) that had robust sets of con- 
served and species-specific markers (fig. S13A 
and table S5). The SST_1 consensus type was 
distinct from most other MGE-derived inter- 
neurons (fig. SI3B) and expressed 7H in human, 
macaque, and marmoset neurons but not in 
chimpanzee or gorilla neurons (fig. S13, C to 
E). Conserved (for example, NCAM2, PTPRK, 
UNC5D, and CNTNAP5) and species-specific 
genes were enriched in pathways for connec- 
tivity and signaling (fig. S13, F to J). SST_1 was 
the rarest type in all primates (fig. SI3K) and 
varied from 0.3% of SST-expressing interneu- 
rons in gorillas and macaques to 1 to 3% in 
humans, chimpanzees, and marmosets. Inter- 
estingly, most 7H-expressing neurons belonged 
to different interneuron subclasses in humans 
(SST), macaques (PVALB), and marmosets (VIP) 
(fig. S13L), and this was confirmed by in situ 
labeling of TH-expressing neurons in human 
and macaque MTG (fig. S13M). Dopamine re- 
ceptor expression varied across primates but 
did not track with predicted differences in 
local dopamine production (fig. $13, N and O). 
This is likely because subcortical regions pro- 
vide most of the dopaminergic input to the 
neocortex and mask the effects of evolution- 
ary changes in local input. 

We tested for changes in the proportions of 
neuronal consensus types across primates using 
a Bayesian model (SeCODA) that accounted for 
the compositional nature of the data (fig. S14A) 
(43). We found that the higher E:I ratio in 
marmosets (fig. S4B) was driven by increased 
proportions of most excitatory types, and the 
lower E:I ratio in macaques was driven by 
increased proportions of, in particular, SST and 
VIP interneuron types and by decreased pro- 
portions of L2/3 IT_2, L2/3 IT_3, and L5/6 IT 
CAR3_2 excitatory types. There were smaller 
changes among the great apes, except for an 
increased proportion of L5/6 NP_2 neurons in 
humans and chimpanzees. 

Next, we identified species-specific genes by 
comparing consensus cell-type expression for 
each species to all other primates. Human con- 
sensus types had a broad range (fewer than 100 
to more than 1000) of statistically significant 
DEGs (Fig. 5C and table S7) that represented 1 to 
8% of expressed genes (fig. S14B). Excitatory 
types in deep layers (IT and non-IT) had the 
most human-specific DEGs (hDEGs), including 
L5/6 NP_2, L6 CT_1, and both subtypes of L5/6 
IT CAR3 neurons (Fig. 5C). Non-neuronal types 
had the fewest hDEGs despite having the 
lowest correlated expression between species 
(Fig. 2, D and E). Two factors contributed to 
this apparent inconsistency. First, non-neuronal 
cells expressed fewer transcripts than neurons, 
and the number of hDEGs as a proportion of 
median expressed genes was similar for non- 
neuronal and some neuronal types (fig. S14B). 
Second, non-neuronal cells were more varia- 
ble across individuals than neurons (fig. S4C), 
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and there was reduced power to detect smaller 
expression changes. Indeed, non-neuronal and 
neuronal types with fewer hDEGs had larger 
median fold changes that were statistically 
significant despite high interindividual varia- 
tion (fig. S14.B). 

Many species-specific DEGs were restricted 
to one or a few cell types, particularly for great 
apes (fig. S14C). The cell-type specificity of 
DEGs was not simply a result of expression 
changes in marker genes; it was also the result 
of selective changes in broadly expressed genes 
(fig. S14.D). hDEGs had a median fourfold change 
in expression, whereas a few metabolism-related 
genes changed expression by 20-fold or more 
in most cell types (fig. S14E). The same genes 
were often differentially expressed in multiple 
species (fig. S14F) but in different cell types, 
and highly divergent (>10-fold) genes were 
usually found across all species. In situ mea- 
surement of two hDEGs, COLIIAI and DACHI, 
validated enriched expression in human Chan- 
delier and L5/6 IT CAR3 neurons, respectively 
(fig. S14G). Species-specific DEGs were en- 
riched in four major pathways: ribosomal pro- 
cessing, ECM, axon structure, and the synapse 
(Fig. 5, D and E). Ribosomal processing was 
primarily associated with interneurons in 
humans and all cell types in chimpanzees, 
macaques, and marmosets. ECM-associated 
DEGes, including several laminin genes, were 
specific to the VLMC_1 consensus type in hu- 
mans, chimpanzees, and marmosets (fig. S14H) 
and have the potential to alter the blood brain 
barrier, as shown in a mouse model of peri- 
cyte dysfunction (44). Hundreds of axonal 
and synaptic genes were differentially ex- 
pressed in most cell types in all species, and 
this suggests extensive molecular remodeling 
of connectivity and signaling during primate 
evolution. 


Enrichment of HARs and hCONDELs near 
human DEGs 


Genes may change expression between species 
because of neutral or adaptive evolution. To 
investigate which hDEGs may be under posi- 
tive selection, we linked hDEGs to human- 
specific genomic sequence changes. Because 
hDEGs are differentially expressed in only one 
or a few consensus cell types, expression changes 
are likely caused by sequence modifications to 
regulatory regions that can alter transcription 
in select cell types. We examined three pre- 
viously identified classes of genomic regions 
that have changed along the human lineage: (i) 
HARs that are highly conserved across mam- 
mals and have higher substitution rates in the 
human lineage (4), (ii) hCONDELs that are 
highly conserved across mammals and deleted 
in humans (45, 46), and (iii) human ancestor 
quickly evolved regions (HAQERs) that are the 
fastest evolved regions in the human genome 
(47). We found that HARs and hCONDELsS are 
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significantly [false discovery rate (FDR) <0.05] 
enriched near hDEGs in many consensus cell 
types (Fig. 6A and fig. S15, A and B). The pro- 
portion of hDEGs near HARs and hCONDELs 
is highest for non-neuronal consensus types 
such as VLMCs (VLMC_1), microglia (Micro- 
PVM_1), and oligodendrocytes (Oligo_1), likely 
owing to the larger intronic and flanking inter- 
genic regions of hDEGs in these cell types (fig. 
S15D). We found some enrichment of HARs 
and hCONDELs near NHP-specific DEGs (fig. 
S16), which supports previous findings that 
show that accelerated genomic regions in dif- 
ferent primate lineages cluster near similar 
genes (48). 

By contrast, HAQERs are not enriched near 
hDEGs in any consensus cell type (fig. S15C). 
Unlike HARs and hCONDELs, HAQERs need 
not be conserved across other species and po- 
tentially include genomic regions that were 
previously nonfunctional but that acquired 
new functions in humans. Therefore, we tested 
whether HAQERs were enriched near genes 
with differential expression between humans 
and chimpanzees, without regard for their ex- 
pression in other primates, and found signifi- 
cant enrichment for the OPC and L5/6 IT CAR3 
subclasses (fig. S17). HARs and hCONDELS are 
also enriched near genes with differential ex- 
pression between humans and chimpanzees 
in multiple cell subclasses, reflecting the en- 
richment of HARs and hCONDELs near hDEGs. 

Because hDEGs are highly enriched for synap- 
tic genes (Fig. 5D), we asked whether a subset 
of hDEGs that are near HARs or hCONDELs 
and are potentially adaptive are associated with 
specialized localizations or molecular functions 
of the synapse. By performing gene set enrich- 
ment analysis using SynGO (28), we found a 
significant enrichment of hDEGs among SynGO 
genes compared with all expressed genes (P < 
10°*°) and a further enrichment of hDEGs near 
HARs and hCONDELs among SynGO genes 
compared with all hDEGs (P < 10°”) (Fig. 6B, 
fig. S18, and table S8). Among the most- 
enriched SynGO terms were synapse assembly, 
synaptic membrane organization, and trans- 
synaptic signaling. Other SynGO terms were 
not enriched, including synaptic transport, 
metabolism, cytoskeleton, and vesicle exocyto- 
sis machinery (Fig. 6B, figs. S18 and S19, and 
table S8). We also found a significant enrich- 
ment of hDEGs, and those near HARs and 
hCONDELs, within gene families that encode 
synaptic adhesion molecules (P < 10~°) (Fig. 6C, 
figs. S20 and S21, and table S8). 

We next examined how synaptic genetic 
programs have changed expression in specific 
human consensus types. Some gene families 
(neurexins, interleukin receptors, FLRT proteins, 
and Trk receptors) mainly changed in excitatory 
types in deep cortical layers, whereas other 
families (neuroligins, protocadherins, latrophi- 
lins, and immunoglobulin superfamily DCC re- 
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ceptors) primarily changed in inhibitory types 
(Fig. 6C and fig. S20). PVALB interneurons and 
deep-layer excitatory neurons are known to 
establish specific microcircuits in deeper cor- 
tical layers (49), and those types show comple- 
mentary expression changes in ephrin ligands 
and receptors, respectively (Fig. 6C and fig. 
S20B). Moreover, although teneurins, PTP re- 
ceptors, and EPH receptors include hDEGs in 
almost all consensus types (Fig. 6C), specific 
family members are hDEGs only in a subset of 
types. For instance, 13 genes within these fami- 
lies EPHA3, EPHA4, EPHA5, EPHA7, EPHB6, 
PIPRE, PIPRG, PTPRE,PUPRO PIPRS,PTPRT, 
PTPRU, and TENM3) changed expression in 
only one or two consensus types within the 
14 consensus types of L5/6 excitatory neurons 
(Fig. 6D and figs. S20B and S22, A and B). 
Similarly, several genes (CDH1, CDH2, CDH24, 
EFNAS, EFNB2, IGSF9B, LGU, LGI2, LRFN5, 
and SLITRK4) that only diverged in expres- 
sion in inhibitory interneurons also showed 
selective changes in only one or two consen- 
sus types (figs. S20B and S22C). Taken together, 
our data highlight human specializations of 
synaptic gene programs that are highly local- 
ized to specific cell types and may underlie 
differences in synaptic connectivity in specific 
microcircuits. 

We leveraged existing data to identify human- 
specific sequence changes in regulatory regions 
linked to hDEGs that may drive differential 
expression in select cell types. For example, 
PTPRG is a member of the PTP receptor fam- 
ily that acts as presynaptic organizers for 
synapse assembly (50). Genetic variants in 
PTPRG have been associated with neuro- 
psychiatric disorders, and Piprg mutant mice 
show memory deficits, supporting an impor- 
tant role for PTPRG in cognitive function (57-54). 
We found that PTPRG was widely expressed 
across cell types (fig. S23A) and had lower ex- 
pression in humans than in NHPs in four 
consensus types: one excitatory neuron type 
(L5 ET_2), microglia (Micro-PVM_1), and two 
inhibitory neuron types (VIJP_2 and VIP_6) 
(Fig. 6E and fig. S23B). PTPRG is located near 
HARsv2_1818 (chr3:61283266-61283416, hg38), 
which has decreased enhancer activity from 
the human sequence compared with that from 
the orthologous chimpanzee sequence (J4). Of 
note, a 5-kb genomic interval that includes 
HARsv2_1818 has been shown to interact with 
the promoter of PTPRG, specifically in excita- 
tory neurons but not in inhibitory neurons or 
microglia (15, 16). This raises the possibil- 
ity that decreased enhancer activity from 
HARsv2_1818 in humans may have decreased 
PTPRG expression specifically in the excitatory 
neuron consensus type L5 ET_2 and that 
separate regulatory mechanisms may decrease 
PTPRG expression in microglia and specific 
inhibitory neuron consensus types. In support 
of this hypothesis, there is a base-pair substi- 
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tution in the human HARsv2_1818 sequence 
that removes a binding site for TWIST], a 
basic helix-loop-helix transcription factor. We 
found that TWIST] is expressed predominantly 
in excitatory neuron consensus type L5 ET_2 
compared with microglia or inhibitory neuron 
consensus types (Fig. 6E and fig. S23C), further 
suggesting that human-specific sequence 
changes in HARsv2_1818 may specifically de- 
crease PTPRG expression in L5 ET_2. We ex- 
tended this analysis to link 112 HARs to 92 
hDEGs in neurons using existing data (15, 16), 
and we posit that genomic interaction data from 
specific cell types may reveal additional genes 
that may be regulated by human-specific se- 
quence changes. 


Discussion 


Transcriptomic profiling of more than 570,000 
nuclei from the MTG region of primate neo- 
cortex reveals a notably conserved cellular 
architecture across humans and four NHPs: 
chimpanzees, gorillas, macaques, and marmo- 
sets. Humans and the other great apes have 
nearly identical proportions and laminar distri- 
butions of consensus types, whereas marmosets 
are the most distinct, with markedly increased 
proportions of L5 ET and L5/6 IT CAR3 ex- 
citatory neurons and Chandelier interneurons. 
Great apes have similar proportions of two 
major subtypes of L5/6 IT CAR3 neurons that 
have high or low CUX2 expression and distinct 
positions in layers 5 and 6, and marmosets have 
mostly high-CUX2 neurons. Unlike those in pri- 
mates, L5/6 IT CAR3 neurons in mice express 
markers of both subtypes. These neurons are 
transcriptomically homogeneous across the 
mouse cortex yet project to diverse cortical 
targets, including proximal areas and homo- 
typic areas in the contralateral hemisphere 
(55). High-CUX2 neurons are selectively enriched 
in language-related regions in the human tem- 
poral and parietal cortex (MTG, Al, and ANG) 
(21), and these neurons may have distinct con- 
nectivity and contribute to the functional spe- 
cializations of these regions. 

Cell-type expression differences are more pro- 
nounced than proportion differences and mostly 
parallel evolutionary distances. One notable ex- 
ception is that neuronal expression diverged 
more rapidly in the human lineage (56) than 
in the chimpanzee and gorilla lineages. In all 
primates, evolutionary expression changes are 
substantially accelerated in microglia, astrocytes, 
and oligodendrocytes compared with neurons 
and OPCs, even after accounting for higher 
variability of glial expression between indivi- 
duals. In addition, human glia express more 
highly divergent genes than chimpanzee or 
gorilla glia, suggesting faster divergence of human 
microglia and astrocytes (9) as well as oligoden- 
drocytes (8) among great apes. Finally, we ob- 
served human-specific changes in isoform usage 
of genes that often have conserved transcript 
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amounts. This highlights the importance of 
profiling full-length transcripts in molecular 
studies of cellular diversity to identify a more 
comprehensive set of genes with potentially 
functional changes. 

Humans and NHPs have hundreds of DEGs 
that are specific to one or a few consensus cell 
types and are enriched in molecular pathways 
related to ribosomal processing, cell connec- 
tivity, and synaptic function. Human-specific 
changes in synaptic gene expression are complex, 
and distinct families of genes are differentially 
expressed in select neuronal and non-neuronal 
cell types. For example, ephrin molecules spe- 
cifically differ in PVALB inhibitory cell types, 
whereas their cognate receptors (EPH receptors) 
change prominently in deep layer excitatory 
neurons. Importantly, ephrin-EPH receptor sig- 
naling has been shown to promote synapto- 
genesis in the mouse developing cortex (57, 58). 
Because PVALB interneurons and excitatory 
neurons form cell type-specific patterns of 
connectivity (49), the differential expression of 
ephrins and EPH receptors could reflect pri- 
mate species differences in the formation of 
inhibitory microcircuits that involve specific 
subtypes of PVALB interneurons and excita- 
tory neurons. Also, a substantial proportion 
of synaptic cell-adhesion genes showed down- 
regulated expression in human neurons, particu- 
larly in gene families that encode PTP receptors, 
including PTPRG, and EPH receptors. Some 
studies have proposed roles in synapse elimina- 
tion for members of highly divergent synaptic 
families, including Pcdh10, ephrin-B1, and 
ephrin-A2 (59, 60). In such a case, reduced ex- 
pression of negative regulators of synaptic 
assembly in human neurons could lead to an 
enhanced ability to form synaptic connec- 
tions, potentially underlying the greater num- 
ber of synapses per neuron that are observed 
in the human cortex compared with that of 
NHPs (6]). 

Emerging evidence demonstrates the criti- 
cal role that non-neuronal cell types play in 
cortical development, network function, and 
behavior (62-67). Previous molecular assays 
have identified a role for ErbB4-mediated sig- 
naling in astrogenesis, astrocyte-neuron com- 
munication, and astrocyte-induced neuronal 
remodeling, potentially through both paracrine 
and autocrine signaling (68-70). We observed 
changes in the expression of the ERBB4 re- 
ceptor and its cognate ligands NRG2 and NRG3 
in human astrocytes compared with chimpanzee 
and gorilla astrocytes. Altogether, these findings 
point toward finely regulated molecular speciali- 
zations underlying neuronal and glial communi- 
cation in the human cortex. Our data also serve 
as a resource for future investigation of human- 
enriched astrocyte and microglia gene programs 
associated with disease. 

Deeper sampling of cells and individuals will 
be needed to disentangle the genetic and 
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environmental effects that drive cell-type spe- 
cializations and whether expression differ- 
ences represent changes in cell types or cell 
states. Moreover, molecular and morphologi- 
cal specializations of human cortical neurons 
may be linked to macroscale anatomical changes 
given that the number of synapses per neuron 
increases predictably with brain size across 
human and non-human primates (77). Thus, a 
phylogenetically broader set of mammals, in- 
cluding large-brained, nonprimate species, will 
be needed to differentiate between cellular 
features that result from human brain scal- 
ing versus specialized cognitive capacities, 
such as language. 

Cell type-specific evolutionary changes in 
gene expression are likely driven by sequence 
changes to regulatory regions that can be active 
with high spatial and temporal precision. This 
is supported by prior studies of genome- 
sequence evolution in humans and other 
species that estimated that more than 80% of 
adaptive sequence change is likely regulatory 
(45, 72, 73). Indeed, we find that previously 
identified genomic regions that have human- 
specific sequence changes, such as HARs and 
hCONDELs, are enriched near hDEGs. This 
association is observed for both neuronal and 
non-neuronal consensus types. In addition to 
well-described changes in the number and 
function of neurons in the human brain (74), 
many non-neuronal cell types also undergo 
comparable changes in the human lineage 
(75, 76). Moreover, hDEGs, including those 
near HARs and hCONDELs, have been found 
to play critical roles in synapse establishment, 
elimination, and maintenance when expressed 
by neuronal and non-neuronal cells (77). Asso- 
ciating genomic regions with signatures of 
selection in humans to hDEGs provides a 
framework to link regulatory sequence changes 
to human-specific cellular and circuit-level phe- 
notypes through expression changes in select 
cell types. 


Materials and methods 
Tissue specimens from primate species 
Human postmortem tissue specimens 


Deidentified postmortem adult human brain 
tissue was obtained after receiving permission 
from the deceased’s next of kin. Tissue col- 
lection was performed in accordance with the 
provisions of the United States Uniform Ana- 
tomical Gift Act of 2006 described in the 
California Health and Safety Code section 7150 
(effective 1 January 2008) and other applicable 
state and federal laws and regulations. The 
Western Institutional Review Board reviewed 
tissue collection procedures and determined 
that they did not constitute human subject 
research requiring institutional review board 
(IRB) review. 

Male and female individuals 18 to 68 years 
of age with no known history of neuropsychiatric 
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or neurological conditions were considered for 
inclusion in the study. Routine serological 
screening for infectious disease (HIV, hepatitis B, 
and hepatitis C) was conducted using individ- 
ual blood samples, and individuals testing posi- 
tive for infectious disease were excluded from 
the study. 

Specimens were screened for RNA quality 
and samples with average RNA integrity (RIN) 
values =7.0 were considered for inclusion in 
the study. Postmortem brain specimens were 
processed as previously described (77) (dx.doi. 
org/10.17504/protocols.io.bf4ajqse). Briefly, cor- 
onal brain slabs were cut at 1-cm intervals, 
frozen in dry-ice-cooled isopentane, and trans- 
ferred to vacuum-sealed bags for storage at 
—80°C until the time of further use. To iso- 
late the MTG, tissue slabs were briefly trans- 
ferred to -—20°C, and the region of interest 
was removed and subdivided into smaller blocks 
on a custom temperature controlled cold table. 
Tissue blocks were stored at —80°C in vacuum- 
sealed bags until later use. 


Chimpanzee and gorilla tissue specimens 


Chimpanzee tissue was obtained from the 
National Chimpanzee Brain Resource (sup- 
ported by NIH grant NS092988). Gorilla sam- 
ples were collected postmortem after naturally 
occurring death or euthanasia of the animals 
for medical conditions at various zoos. Gorilla 
and chimpanzee brains were divided into 2-cm 
coronal slabs, flash-frozen using dry-ice-cooled 
isopentane, liquid nitrogen, or a —80°C freezer, 
and then stored in freezer bags at —80°C. 
Tissues from the MTG were removed from ap- 
propriate slabs, which were maintained on dry 
ice during dissection and were shipped to the 
Allen Institute overnight on dry ice. 


Macaque tissue specimens 


Macaque tissue samples were obtained from 
the University of Washington National Primate 
Resource Center under a protocol approved 
by the University of Washington Institutional 
Animal Care and Use Committee. Immediately 
after euthanasia, macaque brains were removed 
and transported to the Allen Institute in arti- 
ficial cerebral spinal fluid equilibrated with 
95% O, and 5% COs. Upon arrival at the Allen 
Institute, brains were divided down the mid- 
line, and each hemisphere was subdivided 
coronally into 0.5-cm slabs. 

Slabs were flash frozen in dry-ice-cooled iso- 
pentane, transferred to vacuum-sealed bags, 
and stored at —-80°C. MTG was removed from 
brain slabs as described above for human 
tissues. 


Marmoset tissue specimens 


Marmoset experiments were approved by and 
in accordance with Massachusetts Institute of 
Technology IACUC protocol number 051705020. 
Adult marmosets (1.5 to 2.5 years old, three 
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individuals) were deeply sedated by intra- 
muscular injection of ketamine (20 to 40 mg ke”) 
or alfaxalone (5 to 10 mg kg”), followed by 
intravenous injection of sodium pentobarbital 
(10 to 30 mg kg ~). When the pedal with-drawal 
reflex was eliminated and/or the respiratory 
rate was diminished, animals were transcar- 
dially perfused with ice-cold sucrose- HEPES 
buffer (78). Whole brains were rapidly ex- 
tracted into fresh buffer on ice. Sixteen 2-mm 
coronal blocking cuts were rapidly made using a 
custom-designed marmoset brain matrix. Slabs 
were transferred to a dish with ice-cold dis- 
section buffer (78). All regions were dissected 
using a marmoset atlas as reference (79), and 
were snap-frozen in liquid nitrogen or dry ice- 
cooled isopentane, and stored in individual 
microcentrifuge tubes at —80°C. 

Temporal lobe dissections targeted area TE3 
and TPO on the lateral temporal surface. Though 
a true homology to catarhine MTG may not exist 
in marmosets, these areas in marmoset form 
part of the temporal lobe association cortex. 
Moreover, on the basis of tract tracing con- 
nectivity studies (80), TE3 and TPO participate 
in the “default mode network,” a functionally 
coupled network of higher-order association 
cortex that includes MTG in other species (87). 
Cortical area DLPFC targeted the dorsolateral 
surface of PFC, approximately 2.5 to 3 mm from 
the frontal pole. ACC/PFCm included medial 
frontal cortex anterior to the genu of the corpus 
callosum. M1 dissections were stained with 
fluorescent Nissl and targeted the hand or trunk 
region. SI like sampled all primary somato- 
sensory areas (A3,A1/2). Al dissections targeted 
primary auditory area but likely include some 
rostral and caudal parabelt cortex. V1 dissec- 
tions were collected on the dorsal bank of the 
calcarine sulcus approximately 4 to 6 mm from 
the posterior pole. 


Tissue processing and snRNA-seq 
SMART-seg v4 nucleus isolation and 
sorting (human, chimpanzee, and gorilla) 


Vibratome sections of MTG blocks were stained 
with fluorescent Nissl permitting microdissection 
of individual cortical layers as previously de- 
scribed (dx.doi.org/10.17504/protocols.io.bq6ymzfw). 
Nucleus isolation was performed as described 
(dx.doi.org/10.17504/protocols.io.ztqf6mw). 
Briefly, single-nucleus suspensions were stained 
with DAPI (4’,6-diamidino-2-phenylindole dihy- 
drochloride, ThermoFisher Scientific, D1306) at 
a concentration of 0.1 ug/ml. Controls were in- 
cubated with mouse IgGI1k-PE Isotype control 
(BD Biosciences, 555749, 1:250 dilution) or DAPI 
alone. To discriminate between neuronal and 
non-neuronal nuclei, samples were stained with 
mouse anti-NeuN conjugated to PE (FCMAB317PE, 
EMD Millipore) at a dilution of 1:500. Single- 
nucleus sorting was carried out on either a BD 
FACSAria II SORP or BD FACSAria Fusion 
instrument (BD Biosciences) using a 130-um 
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nozzle and BD Diva software v8.0. A standard 
gating strategy based on DAPI and NeuN stain- 
ing was applied to all samples as previously 
described (17). Doublet discrimination gates 
were used to exclude nuclei multiplets. Indi- 
vidual nuclei were sorted into 96-well plates, 
briefly centrifuged at 1000 rpm, and stored 
at —80°C. 


SMART-seq v4 RNA-seq 


The SMART-Seq v4 Ultra Low Input RNA Kit 
for Sequencing (Takara no. 634894) was used 
per the manufacturer’s instructions. Standard 
controls were processed with each batch of 
experimental samples as previously described. 
After reverse transcription, cDNA was ampli- 
fied with 21 PCR cycles. The NexteraXT DNA 
Library Preparation (Illumina FC-131-1096) kit 
with NexteraXT Index Kit V2 Sets A-D (FC-131- 
2001, 2002, 2003, or 2004) was used for se- 
quencing library preparation. Libraries were 
sequenced on an Illumina HiSeq 2500 instru- 
ment (Illumina HiSeq 2500 System, RRID: 
SCR_016383) using Illumina High Output V4 
chemistry. The following instrumentation soft- 
ware was used during data generation work- 
flow: SoftMax Pro v6.5; VWorks v11.3.0.1195 
and v13.1.0.1366; Hamilton Run Time Con- 
trol v4.4.0.7740; Fragment Analyzer v1.2.0.11; 
and Mantis Control Software v3.9.7.19. 


SMART-seg v4 gene expression 
quantification 


For human, raw read (fastq) files were aligned 
to the GRCh38 genome sequence (Genome 
Reference Consortium, 2011) with the RefSeq 
transcriptome version GRCh38.p2 (RefSeq, 
RRID:SCR_003496, current as of 13 April 2015) 
and updated by removing duplicate Entrez 
gene entries from the gtf reference file for 
STAR processing, as previously described (7). 
For chimpanzee and gorilla, the Clint_PTRv2 
and Susie3 NCBI reference genomes were 
used for alignment, respectively. For align- 
ment, Illumina sequencing adapters were 
clipped from the reads using the fastqMCF 
program (from ea-utils). 

After clipping, the paired-end reads were 
mapped using Spliced Transcripts Alignment 
to a Reference (STAR v2.7.3a, RRID:SCR_015899) 
using default settings. Reads that did not map 
to the genome were then aligned to synthe- 
tic construct (i.e., ERCC) sequences and the 
Escherichia coli genome (version ASM584v2). 
Quantification was performed using summer- 
izeOverlaps from the R package GenomicAlign- 
ments v1.18.0. Gene expression was calculated 
as counts per million (CPM) of exonic plus in- 
tronic reads. 


10x RNA-seg (human, chimpanzee, gorilla, 
and macaque) 


Nucleus isolation for 10x Chromium snRNA- 
seq was conducted as described (dx.doi.org/ 
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10.17504/protocols.io.y6rfzd6). Gating was as 
described for SSv4 above. NeuN* and NeuN™ 
nuclei were sorted into separate tubes and 
were pooled at a defined ratio (90% NeuN’, 
10% NeuN ) after sorting. Sorted samples were 
centrifuged, frozen in a solution of 1X PBS, 1% 
BSA, 10% DMSO, and 0.5% RNAsin Plus RNase 
inhibitor (Promega, N2611), and stored at 
—80°C until the time of 10x chip loading. 

Immediately before loading on the 10x Chro- 
mium instrument, frozen nuclei were thawed 
at 37°C, washed, and quantified for loading 
as described (dx.doi.org/10.17504/protocols. 
io.nx3dfqn). Samples were processed using 
the 10x Chromium Single-Cell 3’ Reagent Kit v3 
following the manufacturer’s protocol. Gene 
expression was quantified using the default 
10x Cell Ranger v3 (Cell Ranger, RRID:SCR_ 
017344) pipeline. 

Reference genomes included the modified 
genome annotation described above for SMART- 
seq v4 quantification (human), Clint_PTRv2 
(chimpanzee), Susie3 (gorilla), and Mmul_10 
(rhesus macaque). Introns were annotated 
as “mRNA,” and intronic reads were included 
in expression quantification. 


10x RNA-seg (marmoset) 


Unsorted single-nucleus suspensions from fro- 
zen marmoset samples were generated as in 
(10). GEM generation and library preparation 
were performed following the manufacturer’s 
protocol (10X Chromium single-cell 3’ v.3, pro- 
tocol version #CG000183_ ChromiumSingle- 
Cell3'_v3_UG_Rev-A). Raw sequencing reads 
were aligned to the CJ1700 reference. Reads 
that mapped to exons or introns were assigned 
to annotated genes. 


RNA-seq processing and clustering 
Cell-type label transfer 


Human MTG and M1 reference taxonomy sub- 
class labels (12, 27) were transferred to nuclei 
in the current MTG dataset using Seurat’s label 
transfer (3000 high variance genes using the 
“vst” method then filtered through exclusion 
list). For human label mapping to other species, 
higher variance genes were included from a list 
of orthologous genes [14,870 genes; down- 
loaded from NCBI Homologene (https://www. 
ncebi.nlm.nih.gov/homologene) in November 
2019; RRID SCR_002924]. This was carried out 
for each species and RNA-seq modality dataset; 
for example, human-Cv3 and human-SSv4: were 
labeled independently. Each dataset was sub- 
divided into five neighborhoods—IT and non- 
IT excitatory neurons, CGE- and MGE-derived 
interneurons, and non-neuronal cells—based 
on marker genes and transferred subclass labels 
from published studies of human and mouse 
cortical cell types and cluster grouping relation- 
ships in a reduced dimensional gene expression 
space. MTG and MI subclass labels were highly 
consistent for all neighborhoods and species 
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(adjusted Rand index 0.88 to 0.99), and a final 
set of labels was manually curated using addi- 
tional information, such as layer dissections. 


Filtering low-quality nuclei 


SSv4: nuclei were included for analysis if they 
passed all QC criteria: 

>30% cDNA longer than 400 base pairs 

>500,000 reads aligned to exonic or intronic 
sequence 

>40% of total reads aligned 

>50% unique reads 

>0.7 TA nucleotide ratio 

QC was then performed at the neighbor- 
hood level. Neighborhoods were integrated to- 
gether across all species and modality; for 
example, deep excitatory neurons from human- 
Cv3, human- SSv4, Chimp-Cv3, and so on. 
datasets were integrated using Seurat integra- 
tion functions with 2000 high variance genes 
from the orthologous gene list. Integrated 
neighborhoods were Louvain clustered into 
over 100 meta cells, and low-quality meta cells 
were removed from the dataset based on 
relatively low UMI or gene counts (which in- 
cluded glia and neurons with more than 500 
and 1000 genes detected, respectively), pre- 
dicted doublets using DoubletFinder (82) and 
default parameters (included nuclei with dou- 
blet scores under 0.3), and/or subclass label 
prediction metrics within the neighborhood 
(i.e., excitatory labeled nuclei that clustered 
with majority inhibitory or non-neuronal nuclei). 


RNA-seg clustering 


Nuclei were normalized using SCTransform 
(19), and neighborhoods were integrated to- 
gether within a species and across individuals 
and modalities by identifying mutual nearest 
neighbor anchors and applying canonical cor- 
relation analysis as implemented in Seurat 
(18). For example, deep excitatory neurons from 
human-Cv3 were split by individual and inte- 
grated with the human-SSv4 deep excitatory 
neurons. Integrated neighborhoods were Louvain 
clustered into more than 100 meta cells. Meta cells 
were then merged with their nearest neighbor- 
ing meta cell until merging criteria were suf- 
ficed, which is a split and merge approach that 
has been previously described (72). The remain- 
ing clusters underwent further QC to exclude 
low-quality and outlier populations. These ex- 
clusion criteria were based on irregular group- 
ings of metadata features that resided within a 
cluster. 


Robustness tests of cell subclasses 
using MetaNeighbor 


MetaNeighbor v1.12 (38, 39) was used to pro- 
vide a measure of neuronal and non-neuronal 
subclass and cluster replicability within and 
across species. We subset snRNA-seq datasets 
from each species to the list of common or- 
thologs before further analysis. For each assess- 
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ment, we identified highly variable genes using 
the get_variable_genes function from Meta- 
Neighbor. To identify homologous cell types, 
we used the MetaNeighborUS function, with 
the fast_version and one_vs_best parameters 
set to TRUE. The one_vs_best parameter iden- 
tifies highly specific cross-dataset matches 
by reporting the performance of the closest 
neighboring cell type over the second closest 
as a match for the training cell type, and the 
results are reported as the relative classifica- 
tion specificity (AUROC). This step identified 
highly replicable cell types within each species 
and across each species pair. All 24 subclasses 
are highly replicable within and across species 
(one_vs_best AUROC of 0.96 within species 
and 0.93 across species in fig. S4A). 


Defining cross-species consensus cell types 


Although cell type clusters are highly replica- 
ble within each species (one_vs_best AUROC 
of 0.93 for neurons and 0.87 for non-neurons), 
multiple transcriptionally similar clusters 
mapped to each other across each species pair 
(average cross-species one_vs_best AUROC of 
0.76). To build a consensus cell type taxonomy 
across species, we defined a cross-species clus- 
ter as a group of clusters that are either re- 
ciprocal best hits or clusters with AUROC >0.6 
in the one_vs_best mode in at least one pair of 
species. This lower threshold (AUROC > 0.6) 
reflects the high difficulty and/or specificity of 
testing only against the best performing other 
cell type. We identified 86 cross-species clus- 
ters, each containing clusters from at least two 
primates. Any unmapped clusters were assigned 
to one of the 86 cross-species clusters based on 
their transcriptional similarity. For each un- 
mapped cluster, the top 10 of their closest 
neighbors were identified using MetaNeighborUS 
one_vs_all cluster replicability scores, and the 
unmapped cluster was assigned to the cross- 
species cluster in which a strict majority of its 
nearest neighbors belong. For clusters with no 
hits, this was repeated using the top-20 closest 
neighbors, still requiring a strict majority to 
assign a cross-species type. Five hundred ninety- 
four clusters present in five primates (i.e., union) 
mapped to 86 cross-species clusters, with 493 
clusters present across 57 consensus cross- 
species clusters shared by all five primates 
(table S6). This is described in more detail in 
our companion manuscript (83). Additional 
sampling of species and developmental time 
points will be needed to distinguish between 
transcriptomic specializations of conserved 
cell types and the emergence of closely related 
but distinct cell types. In this study, the 101 clus- 
ters with initial homologies across fewer than 
five species were assigned to the most similar 
of the 57 consensus types. 

An alternative approach for consensus clus- 
tering was used to assess the robustness of 
homologous cell type clusters identified by 
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MetaNeighbor. For each of the five cell-type 
neighborhoods (non-neuronal, MGE- and CGE- 
derived interneurons, and IT- and non-IT- 
projecting excitatory neurons), we built a 
reference with four primate datasets and used 
the fifth primate dataset as query for cell type 
annotation using scArches (40). We built each 
reference dataset using 2000 highly variable 
genes, trained a model on the reference using 
scPoli (84), and mapped the query cells onto 
the reference data (fig. S12). scPoli learns a set 
of cell-type prototypes from the latent cell 
representation of the reference data (fig. S12, 
C and D). The cells in the query dataset were 
annotated based on their closest cell-type 
prototype in the reference data (fig. S12E), and 
the classification uncertainty was estimated by 
Euclidean distance from this prototype (fig. 
S12F). Query cells typically mapped to cell-type 
prototypes identified in the reference data with 
low label transfer uncertainty, highlighting the 
robustness of the primate MTG consensus taxo- 
nomy. Cell-type labels predicted by scPoli were 
largely consistent with the consensus cell types 
identified by MetaNeighbor (overall classifica- 
tion accuracy with scPoli = 0.74, average cell- 
type classification accuracy = 0.68), although 
the classification accuracy varied with cell-type 
neighborhood (ranging from 0.91 across glial 
cell types to 0.69 across IT- type excitatory 
neurons). 


Cell-type taxonomy generation 


For each species, a taxonomy was built using 
the final set of clusters and was annotated 
using subclass mapping scores, dendrogram 
relationships, marker gene expression, and 
inferred laminar distributions. Within-species 
taxonomy dendrograms were generated using 
build_dend function from scrattch_hicat R 
package. A matrix of cluster median log2(CPM 
+ 1) expression across the 3000 High-variance 
genes for Cv3 nuclei from a given species were 
used as input. The cross-species dendrogram 
was generated with a similar workflow but was 
down-sampled to a maximum of 100 nuclei 
per cross-species cluster per species. The 3000 
High-variance genes used for dendrogram 
construction were identified from the down- 
sampled matrix containing Cv3 nuclei from all 
five species. We generated the complete cross- 
species cluster dendrogram using average- 
linkage hierarchical clustering with (1 - average 
MetaNeighborUS one_vs_all cluster replica- 
bility scores) for each pair of 86 cross-species 
clusters as a measure of distance between cell 


types. 


Cell-type comparisons across species 
Differential gene expression 


To identify subclass marker genes within a 
species, Cv3 datasets from each species were 
down-sampled to a maximum of 100 nuclei 
per cluster per individual. 
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Differentially expressed marker genes were 
then identified using the FindAllMarkers func- 
tion from Seurat, using the Wilcoxon rank-sum 
test on log-normalized matrices with a maxi- 
mum of 500 nuclei per group (subclass versus 
all other nuclei as background). Statistical 
thresholds for markers are indicated in their 
respective figures. To identify species marker 
genes across subclasses and consensus cell 
types, Cv3 datasets from each species were 
down-sampled to a maximum of 50 nuclei 
per cluster per individual. Down-sampled counts 
matrices were then grouped into pseudo-bulk 
replicates (species, individual, subclass or con- 
sensus types) and the counts were summed 
per replicate. DESeq2 functionality was then 
used to perform a differential expression analy- 
sis between species pairs (or comparisons of 
interest) for each subclass or consensus type 
using the Wald test statistic. 


Expression correlations 


Subclasses were compared between each pair 
of species using Spearman correlations on sub- 
class median log2(CPM + 1) expression of ortho- 
logous genes that had a median value greater 
than zero in both species. These Spearman 
correlations were then visualized as heatmaps 
and also compared to the human-centric evo- 
lutionary distance from each species in Fig. 2. 
Similarly, subclasses were compared across 
individuals within each species, and the aver- 
age Spearman correlation of all pairwise com- 
parisons of individuals was calculated. Within 
species correlations were performed on ortholo- 
gous genes with median values greater than zero 
in all donors for a given subclass. Nuclei were 
down-sampled to a maximum of 100 nuclei 
per subclass per donor for comparisons. 


Taxonomy comparisons 


To assess homologies between clusters from 
taxonomies of different species or different 
studies, we constructed Euclidean distance heat- 
maps that were anchored on one side by the 
taxonomies’ dendrogram. The heatmaps dis- 
play the cluster labels of a single taxonomy on 
either end, and the heatmap values represent 
the Euclidean distance between cluster cen- 
troids in the reduced dimensional space using 
30 to 50 principal components from a PC 
analysis. In the case of cross-species compari- 
sons, the reduced space was derived from Cv3 
data. The —log(Euclidean distance) is plotted, 
with smaller values indicating more similar 
transcriptomic profiles. 


Estimating differential isoform usage 
between great apes 


We used Smart-seq snRNA-seq data from hu- 
mans (~14,500 cells), chimpanzees (~3500 cells), 
and gorillas (~4300 cells) to assess isoform 
switching between the species for each cell 
subclass. The RNA-seq reads were mapped 
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to each species’ genome using STAR as de- 
scribed above. The isoforms were quantified 
using RSEM on a common set of annotated 
transcripts (TransMap V5 downloaded from 
the UCSC browser, RRID:SCR_005780) by 
aggregating reads from cells in each cell sub- 
type using a pseudo-bulk method: 

1. Aggregated reads from cells in each subclass 

2. Mapped reads to the human, chimpanzee, 
or gorilla reference genome with STAR 2.7.7a 
using default parameters 

3. Transformed genomic coordinates into 
transcriptomic coordinates using STAR 
parameter:-quantMode TranscriptomeSAM 

4. Quantified isoform and gene expression 
using RSEM v1.3.3 parameters (RSEM, RRID: 
SCR_013027):-bam-seed 12345-paired-end- 
forward-prob 0.5-single-cell- prior-calc-ci 

The isoform proportion metric (isoP) was 
defined as the isoform expression [transcripts 
per million (TPM)] normalized by the total ex- 
pression of the gene the isoform belongs to. To 
focus on highly expressed genes, we considered 
only isoforms originating from the top 50% 
(ranked by gene expression) of genes for each 
species. To control the variability of isoP values, 
we derived the 80% confidence intervals by 
comparing the isoP values of different donors 
for each species using the following procedure: 

1. The isoP values (ranging from 0 to 1) for 
donor 1 are binned into 10 bins of size 0.1. 

2. The isoforms in each bin are sorted by 
the isoP values in donor 2. 

3. The lower and upper bounds of the 80% 
isoP confidence interval are defined as 10% 
and 90% percentile of this sorted list. 

4. The procedure was repeated, switching 
donors 1 and 2, and the isoP confidence inter- 
val bounds values from the two calculations 
were averaged. 

The isoform switching between species was 
considered significant for isoforms whose con- 
fidence intervals were nonoverlapping. We 
defined cross-species isoform switches as those 
that involved a major isoform in one of the 
species (i.e., isoP > 0.5) and report them in 
table S4. A subset of isoforms with strong cross- 
species switching were identified that had isoP 
> 0.7 in one species, isoP < 0.1 in the other 
species, and >3-fold change in proportions be- 
tween the species. 


Identifying changes in cell-type proportions 
across species 


Cell-type proportions are compositional, where 
the gain or loss of one population necessarily 
affects the proportions of the others, so we used 
scCODA (43) to determine which changes in 
cell class, subclass, and cluster proportions 
across species were Statistically significant. 
We focused these analyses on neuronal pop- 
ulations because these were deeply sampled 
in all five species based on sorting of nuclei 
with NeuN immunostaining. The proportion 
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of each neuronal class, subclass, and cluster 
was estimated using a Bayesian approach 
where proportion differences across individu- 
als were used to estimate the posterior. All 
compositional and categorical analyses require 
a reference population against which to evalu- 
ate differences, and because we were uncertain 
which populations should be unchanged, we 
iteratively used each cell type and each species 
as a reference when computing abundance 
changes. To account for sex differences, we 
included it as a covariate when testing for 
abundance changes. We report the effect size 
of each species and sex for each cell subclass 
and used a mean inclusion probability cutoff of 
0.7 for calling a population credibly different. 


In situ profiling of gene expression 
MERFISH data collection 


Human postmortem frozen brain tissue was 
embedded in Optimum Cutting Temperature 
medium (VWR,25608-930) and sectioned on a 
Leica cryostat at -17°C at 10 um onto Vizgen 
MERSCOPE coverslips (VIZGEN 2040003). These 
sections were then processed for MERSCOPE 
imaging according to the manufacturer’s in- 
structions. Briefly, sections were allowed to 
adhere to these coverslips at room temperature 
for 10 min before a 1-min wash in nuclease-free 
phosphate buffered saline (PBS) and fixation 
for 15 min in 4% paraformaldehyde in PBS. 
Fixation was followed by three 5-min washes 
in PBS before a 1-min wash in 70% ethanol. 
Fixed sections were then stored in 70% eth- 
anol at 4°C before use and for up to one month. 
Human sections were photobleached using a 
150-W LED array for 72 hours at 4°C before 
hybridization then washed in 5 ml of Sample 
Prep Wash Buffer (VIZGEN 20300001) in a 
5-cm petri dish. Sections were then incubated 
in 5 ml of Formamide Wash Buffer (VIZGEN 
20300002) at 37°C for 30 min. Sections were 
hybridized by placing 50 ul of VIZGEN-supplied 
Gene Panel Mix onto the section, covering 
with parafilm, and incubating at 37°C for 36 to 
48 hours in a humidified hybridization oven. 
After hybridization, sections were washed 
twice in 5 ml of Formamide Wash Buffer for 
30 min at 47°C. Sections were then embedded in 
acrylamide by polymerizing VIZGEN Embedd- 
ing Premix (VIZGEN 20300004) according to 
the manufacturer’s instructions. Sections were 
embedded by inverting sections onto 110 ul 
of Embedding Premix and 10% Ammonium 
Persulfate (Sigma A3678) and TEMED (BioRad 
161-0800) solution applied to a Gel Slick (Lonza 
50640) treated 2-inch-by-3-inch glass slide. 
The coverslips were pressed gently onto the 
acrylamide solution and allowed to polymerize 
for 1.5 hours. After embedding, sections were 
cleared for 24 to 48 hours with a mixture of 
VIZGEN Clearing Solution (VIZGEN 20300003) 
and Proteinase K (New England Biolabs P8107S) 
according to the manufacturer’s instructions. 
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After clearing, sections were washed twice 
for 5 min in Sample Prep Wash Buffer (PN 
20300001). 

VIZGEN DAPI and PolyT Stain (PN 20300021) 
were applied to each section for 15 min followed 
by a 10-min wash in Formamide Wash Buffer. 
Formamide Wash Buffer was removed and re- 
placed with Sample Prep Wash Buffer during 
MERSCOPE set up. One hundred microliters 
of RNAse Inhibitor (New England BioLabs 
M0314L) was added to 250 ul of Imaging 
Buffer Activator (PN 203000015), and this mix- 
ture was added via the cartridge activation port 
to a prethawed and mixed MERSCOPE Imaging 
cartridge (VIZGEN PN1040004). Fifteen milli- 
liters of mineral oil (Millipore-Sigma m5904- 
6X500ML) were added to the activation port, 
and the MERSCOPE fluidics system was primed 
according to VIZGEN instructions. The flow 
chamber was assembled with the hybridized 
and cleared section coverslip according to 
VIZGEN specifications and the imaging session 
was initiated after collection of a 10OX mosaic 
DAPI image and selection of the imaging area. 
For specimens that passed the minimum count 
threshold, imaging was initiated and process- 
ing was completed according to VIZGEN pro- 
prietary protocol. 

The 140-gene human cortical panel was 
selected using a combination of manual and 
algorithmic based strategies requiring a refer- 
ence single-cell RNA-seg or snRNA-seq dataset 
from the same tissue, in this case the human 
MTG snRNA-seq dataset and resulting taxono- 
my (/7). First, an initial set of high-confidence 
marker genes are selected through a combina- 
tion of literature search and analysis of the refer- 
ence data. These genes are used as input for a 
greedy algorithm (detailed below). Second, the 
reference RNA-seq dataset is filtered to only 
include genes compatible with mFISH. Retained 
genes need to be (i) long enough to allow probe 
design (>960 base pairs), (ii) expressed highly 
enough to be detected (FPKM =10) but not so 
high as to overcrowd the signal of other genes 
in a cell (FPKM <500), (iii) expressed with low 
expression in off-target cells (FPKM <50 in 
non-neuronal cells), and (iv) differentially 
expressed between cell types (top-500 remain- 
ing genes by marker score20). To more evenly 
sample each cell type, the reference dataset is 
also filtered to include a maximum of 50 cells 
per cluster. 

The spatial distribution of human MTG cell 
types was estimated from several sections from 
two donors. For each section, we made two 
manual annotations: a parallelogram spanning 
pia to white matter that selected cells from all 
cortical layers and a line segment from pia to 
white matter along the local cortical column 
axis. The cortical depth was calculated as the 
projection of the coordinates of the selected 
cells onto the cortical column axis. Annota- 
tions were done in napari using a notebook: 
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https://github.com/AllenInstitute/Great_A- 
pe_MTG/blob/master/cell_type_mapping/ 
Great_apes_subsetting_cortical_depth.ipynb. 


Cell-type mapping in MERSCOPE data 


Any genes not matched across both the 
MERSCOPE gene panel and the snRNA-seq 
mapping taxonomy were filtered from the 
snRNA-seq dataset. We calculated the mean 
gene expression for each gene in each snRNA- 
seq cluster. We assigned MERSCOPE cells to 
snRNA-seq clusters by finding the nearest 
cluster to the mean expression vectors of the 
snRNA-seq clusters using the cosine distance. 
All scripts and data used are available at 
https://github.com/AllenInstitute/Great_ 
Ape_MTG. 

The main step of gene selection uses a greedy 
algorithm to iteratively add genes to the initial 
set. To do this, each cell in the filtered reference 
dataset is mapped to a cell type by taking the 
Pearson correlation of its expression with each 
cluster median using the initial gene set of 
size n, and the cluster corresponding to the 
maximum value is defined as the “mapped 
cluster.” The “mapping distance” is then de- 
fined as the average cluster distance between 
the mapped cluster and the originally assigned 
cluster for each cell. In this case a weighted 
cluster distance, defined as one minus the 
Pearson correlation between cluster medians 
calculated across all filtered genes, is used to 
penalize cases where cells are mapped to very 
different types, but an unweighted distance, 
defined as the fraction of cells that do not map 
to their assigned cluster, could also be used. 
This mapping step is repeated for every pos- 
sible m + 1 gene set in the filtered reference 
dataset, and the set with minimum cluster 
distance is retained as the new gene set. These 
steps are repeated using the new get set (of 
size n + 1) until a gene panel of the desired size 
is attained. Code for reproducing this gene 
selection strategy is available as part of the 
mfishtools R library (https://github.com/Alle- 
nInstitute/mfishtools). 

We used our 140-gene MERFISH panel de- 
signed to identify human cortical cell types to 
map every type described in this updated 
human MTG taxonomy to determine cell-type 
locations within cortex and confirm cell-type 
proportions. All cell-type locations are pro- 
vided for reference in graphical format as 
localized in a representative human MTG sec- 
tion H19.30.001.Cx46.MTG.02.02.007.5 (data S1). 


RNAscope 


Fresh-frozen human postmortem brain tissues 
were sectioned at 16 to 25 um onto Superfrost 
Plus glass slides (Fisher Scientific). Sections 
were dried for 20 min on dry ice and then 
vacuum sealed and stored at —80°C until use. 
The RNAscope multiplex fluorescent V2 kit was 
used per the manufacturer’s instructions for 
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fresh-frozen tissue sections (ACD Bio), except 
that slides were fixed 60 min in 4% para- 
formaldehyde in IX PBS at 4°C and treated with 
protease for 15 min. Sections were imaged using 
a 40x oil immersion lens on a Nikon TiE fluo- 
rescence microscope equipped with NIS- 
Elements Advanced Research imaging software 
(v4.20, RRID:SCR_014329). Positive cells were 
called by manual assessment of RNA spots for 
each gene. Cells were called positive for a gene 
if they contained =5 RNA spots for that gene. 

High versus low expression of CUX2 was 
determined by measuring fluorescence intensity 
for that gene in ImageJ. Lipofuscin autofluor- 
escence was distinguished from RNA spot signal 
based on the broad fluorescence spectrum and 
larger size of lipofuscin granules. Staining for 
each probe combination was repeated with sim- 
ilar results on at least two separate individuals 
and on at least two sections per individual. 
Images were assessed with the FIJI distri- 
bution of ImageJ v1.52p and with NIS-Elements 
v4.20. RNAscope probes used were CUX2 (ACD 
Bio, 425581-C3), LDB2 (1003951-C2), and SMYD1 
(493951-C2). 

Fresh-frozen marmoset brain tissue was 
sectioned and processed for RNAscope stain- 
ing as described above for human. Sections 
were imaged with a 10x lens on a Nikon TiE 
fluorescence microscope to collect large over- 
view images and smaller regions of tissue were 
re-imaged using a 40x oil immersion lens. 
Images were assessed as above for human 
except that lipofuscin autofluorescence was 
not apparent in marmoset tissues. RNAscope 
probes used were CUX2 (ACD Bio, 554631-C2), 
NTNG2 (ACD Bio, custom probe targeting 
base pairs 1894 to 2819 of XM_035261022.2), 
and MGAT4C (custom probe targeting base 
pairs 704 to 1799 of XM_035257223.2). Stain- 
ing for this probe combination was repeated 
on three sections from one individual. On all 
sections, an area of probe signal dropout was 
noted at the same location in the secondary 
auditory cortex that we attribute to a poten- 
tial imaging or experimental artifact. All three 
probes had reduced signal intensity in this 
area, and the area is marked in the figure panel 
displaying the RNAscope data (Fig. 2F) with a 
red asterisk. 


Analysis of great ape species pairwise 
comparison for glial cells 


We used 10x snRNA-seq data for the compari- 
son of normalized gene expression across spe- 
cies. Significant differential gene expression in 
pairwise comparisons of glial cells (astrocytes, 
microglia, oligodendrocytes) across great ape 
species was determined at log2 fold-change 
>0.5 and FDR <0.01. Among DEGs from great 
ape pairwise comparisons, species-specific 
highly divergent genes were identified as having 
>10-fold change in expression in a given species 
relative to the other two great ape species, and 
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with a threshold of gene expression of normal- 
ized gene counts >5 in at least one species. GO 
enrichment analysis was performed using the 
Bioconductor package “clusterProfiler”’ (https:// 
bioconductor.org/packages/release/bioc/ 
html/clusterProfiler.html), and the Fisher's 
exact test was used for SynGO enrichment an- 
alysis (https://www.syngoportal.org/). GO and 
SynGO analyses were performed on the union 
of DEGs from the pairwise comparison be- 
tween human and chimpanzee and the pair- 
wise comparison between human and gorilla to 
increase power to detect significant GO terms. 
GO terms under biological process, molecular 
function, and cellular component categories 
were considered in the analysis. Significance 
for enriched terms was determined at 5% FDR. 
All MTG expressed genes in the consensus cell 
types (astrocytes, microglia, oligodendrocytes) 
were considered as the background gene set 
in the respective analyses. Gene expression 
change in glial cell types shown in heatmaps 
(Fig. 3E and figs. S5, A, B, and G, and S6, E 
and F) is calculated as the log2 ratio of nor- 
malized expression counts in a given species 
relative to the other two great ape species. 
To analyze astrocyte genes associated with 
perisynaptic astrocytic processes, a list of genes 
encoding proteins enriched at astrocyte-neuron 
junctions was used from a proteomic study in 
the mouse cortex (29). To analyze microglia 
genes associated with intercellular communi- 
cation and signaling, a list of genes predicted 
to act as the ligand-receptor interactome of 
microglia-neuron communication was used 
from a recent study in the mouse cortex (85). 


Enrichment of HARs and hCONDELs 
near hDEGs 


The set of HARs used in our analysis was ob- 
tained from (/4), and the set of HAQERs used 
in our analysis was obtained from (47). The set 
of hCONDELs was obtained from (45, 46), and 
only hCONDELs that could be mapped to a syn- 
tenic orthologous location in hg38 were retained 
(1175 total) (86). We assigned intronic HARs, 
HAQERs, and hCONDELs to the genes they 
are intronic to and intergenic HARs, HAQERs, 
and hCONDELs to the closest upstream and 
downstream genes (table S8) using Ensembl 
GRCh38 annotations obtained in May 2021 and 
Ensembl Pan_tro_3.0, gorGor4, and Mmul_10 
annotations obtained in January 2023. With re- 
spect to the human annotations, 63.2% of HARs, 
53.7% of HAQERs, and 59.4% of hCONDELs are 
intronic. For 83.2% of the 1165 intergenic HARs, at 
least one of their assigned genes is within 100 kb. 
For 90.7% of the 732 intergenic HAQERs, at least 
one of their assigned genes is within 100 kb. For 
85.5% of the 477 intergenic hCONDELs, at least 
one of their assigned genes is within 100 kb. The 
proportion of intronic and intergenic HARs and 
hCONDELs is similar for the chimpanzee, go- 
rilla, and macaque annotations. We consid- 
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ered HARs, HAQERs, and hCONDELs to be 
enriched near DEGs in a specific cell type if 
they are significant at 5% FDR for both of the 
following tests (87): G) Are DEGs enriched for 
genes near HARs, HAQERs, and/or hCONDELs 
(Fisher’s exact test)? We set the background as 
expressed genes, which adjusts for the fact that 
HARs, HAQERs, and hCONDELs are known to 
be enriched near neural genes. (ii) Are HARs, 
HAQERs, and/or hCONDELs more likely to fall 
near DEGs than expected by chance? We as- 
signed each gene a regulatory domain that com- 
prises the genomic interval containing the gene 
along with the upstream and downstream inter- 
genic regions that extend to the nearest flank- 
ing genes, with an upper bound of 5 Mb in total 
size. We then asked whether HARs, HAQERs, 
and hCONDELs are enriched within the reg- 
ulatory domains of DEGs using the binomial 
test. This takes into account differences in ge- 
nomic structure between genes, under the as- 
sumption that HARs, HAQERs, and hCONDELs 
will be more likely to fall within the regulatory 
domains of genes with large intronic or flank- 
ing regions by chance. 


SynGO and synaptic gene family enrichment 


To analyze the association between synaptic 
terms and human divergent gene expression 
patterns, we used an expert-curated database 
of GO annotations of synapse-related terms 
known as SynGO (28). To test whether hDEGs 
and hDEGs near HARs and hCONDELs are 
enriched in SynGO terms, we used Fisher’s ex- 
act test. We focused on SynGO terms within the 
first and second hierarchical levels of SynGO 
that broadly comprise the entire range of Cel- 
lular Components (CC) and Biological Processes 
(BP) terms, allowing for the visualization of en- 
richment patterns across a wide range of syn- 
aptic localizations and processes (fig. S18). We 
grouped SynGO terms into two levels based 
on their hierarchical organization in SynGO 
(https://www.syngoportal.org/), corresponding 
to the following reference codes: 11 terms with- 
in level 1 (Al, BI, Cl, D1, E1, Fl, Gl, H1, 10, Jl, KD) 
and 71 terms within level 2 (A2-3, B2-11, C2, 
D2-11, E2, F2-10, G2-7, H2-4, 12-15, J2-11, K2-6). 
For synaptic gene families, we examined 15 func- 
tionally related categories: (i) families of cell- 
adhesion and synaptic-adhesion molecules, 
(ii) families of ligand-receptor complexes in- 
volved in growth factor signaling, (iii) families 
of other cell-surface receptors and ligands, (iv) 
families of other G protein-coupled receptors 
(GPCRs) and their ligands (including orphan 
GPCRs), (v) families of ligand-receptor com- 
plexes involved in neuropeptidergic signaling 
and related GPCRs and ligands, (vi) families 
of neurotransmitter-gated receptors and other 
ligand-gated receptors (including glutamate 
ionotropic receptors), (vii) Ras GTPase super- 
family, (viii) families of Ras GAP and GEF sig- 
naling molecules, (ix) families of other regulatory 
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molecules and structural scaffolding proteins, 
(x) families related to other signaling com- 
plexes including intracellular kinases and phos- 
phatases, (xi) families related to the ECM and 
proteoglycan families, (xii) families related to 
cytoskeletal composition and organization and 
other related proteins, (xiii) families involved in 
synaptic vesicle exocytosis and other membrane 
fusion components, (xiv) families of proteases 
and peptidases, and (xv) families of voltage- 
gated ion channels and other gated ion chan- 
nels and solute transporters. For each of these, 
we assembled a comprehensive list based on 
HGNC reference and a previously curated cat- 
alog of synaptic molecules (88) (table S8). Sig- 
nificance was determined at 5% FDR. “All” 
genes are genes that are expressed and can be 
assessed for differential expression by DESeq2 
in at least one consensus type. 
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A comparative atlas of single-cell chromatin 
accessibility in the human brain 
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INTRODUCTION: Neuropsychiatric disorders and 
mental illnesses are the leading cause of disease 
burden in the United States. Tens of thousands 
of sequence variants in the human genome have 
been linked to the etiology of these conditions. 
However, elucidating the role of the identified 
risk variants remains a challenge because most 
of them are outside of protein-coding regions 
and currently lack functional annotation. These 
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disease risk variants likely exert their influence 
by perturbing transcriptional regulatory ele- 
ments, thereby modulating gene expression in 
cell types pertinent to neuropsychiatric disor- 
ders. Recent advancements in single-cell tech- 
nologies have unveiled a high degree of cellular 
heterogeneity across the human brain. However, 
the transcriptional regulatory sequences govern- 
ing the identity and function of each individual 
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Single-cell analysis of chromatin accessibility of the human brain. Candidate cis-regulatory elements 
(cCREs) specific to distinct human brain cell types were identified by single-nucleus assay for transposase-accessible 
chromatin using sequencing (snATAC-seq) and linked to putative target genes through integrative analysis. The 
usage of cCREs was leveraged to predict brain cell types pertinent to neuropsychiatric traits and disorders and to train 
machine-learning models to interpret the function of noncoding risk variants. UMAP, Uniform Manifold Approximation 
and Projection; DL, deep learning; LD, linkage disequilibrium; GWAS, genome wide association study. 
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ing our ability to interpret the noncoding dis-—~ 
risk variants. 


RATIONALE: Conventionally, transcriptional reg- 
ulatory sequences may be determined by evi- 
dence of chromatin accessibility that generally 
accompanies transcription factor binding 
and chromatin remodeling. However, prior 
catalogs of transcriptional regulatory elements 
lack the information about cell-type-specific 
activities of each element because of the use of 
bulk tissue samples. Recent technological strides 
have empowered us to analyze chromatin acces- 
sibility at the single-cell level, enabling the cre- 
ation of cell-type-specific maps of transcriptional 
regulatory elements for complex organs such 
as the human brain. 


RESULTS: In this study, we present a compre- 
hensive analysis of chromatin accessibility in the 
human brain at the single-cell level, encompass- 
ing a collection of 1.1 million cells from 42 dis- 
tinct brain regions in three neurotypical adult 
subjects. We used this chromatin atlas to de- 
fine 107 distinct brain cell types and uncovered 
the state of chromatin accessibility at 544,735 
putative transcriptional regulatory elements in 
these cell types. A substantial number of these 
regulatory elements also exhibited both se- 
quence conservation and chromatin accessi- 
bility in mouse brain cells, underlining their 
functional importance. Through integrative 
analysis, we have linked many putative tran- 
scriptional regulatory elements to potential tar- 
get genes. We further leveraged this atlas to 
predict disease relevant cell types for 19 neuro- 
psychiatric traits and disorders. Finally, we 
developed machine learning models to predict 
the regulatory function of disease risk variants. 
We have made this atlas freely available to 
the public through an interactive web portal 
CATLAS (www.catlas.org). 


CONCLUSION: The single-cell chromatin atlas of 
the human brain represents a valuable resource 
for the neuroscience community. It offers in- 
sights into the gene-regulatory programs shap- 
ing the diversity of brain cell types and aids in 
interpreting the functional roles of disease risk 
variants located outside of protein-coding re- 
gions. This atlas, in combination with other 
molecular and anatomical data, promises to 
advance our understanding of brain function 
and neuropathology, ultimately offering ave- 
nues for more effective approaches to address- 
ing neuropsychiatric disorders. 
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1,2 


Recent advances in single-cell transcriptomics have illuminated the diverse neuronal and glial cell 
types within the human brain. However, the regulatory programs governing cell identity and function 
remain unclear. Using a single-nucleus assay for transposase-accessible chromatin using sequencing 
(snATAC-seq), we explored open chromatin landscapes across 1.1 million cells in 42 brain regions 
from three adults. Integrating this data unveiled 107 distinct cell types and their specific utilization of 
544,735 candidate cis-regulatory DNA elements (cCREs) in the human genome. Nearly a third of the 
cCREs demonstrated conservation and chromatin accessibility in the mouse brain cells. We reveal 
strong links between specific brain cell types and neuropsychiatric disorders including schizophrenia, 
bipolar disorder, Alzheimer’s disease (AD), and major depression, and have developed deep learning 
models to predict the regulatory roles of noncoding risk variants in these disorders. 


europsychiatric disorders and mental 

illnesses are the leading cause of disease 

burden in the United States (7). Tens of 

thousands of sequence variants in the 

human genome have been linked to the 
etiology of neuropsychiatric disorders (2, 3). 
However, interpreting the mode of action of 
the identified risk variants remains a daunt- 
ing challenge because most of them are non- 
protein coding and remain to be annotated 
(4, 5). A large fraction of the noncoding risk 
variants might contribute to disease etiology by 
perturbing transcriptional regulatory sequences 
and disrupting gene expression in disease-relevant 
cell types (6, 7). However, a lack of maps and 
tools to explore gene activities and their tran- 
scriptional regulatory sequences at high cellular 
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and anatomical resolution in the brain presents 
a major barrier to obtaining a clearer mecha- 
nistic understanding of the broad spectrum of 
neuropsychiatric disorders. 

The human brain is made up of hundreds of 
billions of neurons, which through trillions of 
synapses form a complex neurocircuitry to per- 
form diverse neurocognitive functions. The func- 
tionality of the neural circuitry is supported 
and maintained by an even greater number of 
glial cells, including astrocytes (ASCs), oligo- 
dendrocytes (OGCs), oligodendrocyte precursor 
cells (OPCs), microglia (MGCs), and others. Single- 
cell RNA sequencing (scRNA-seq) and high- 
throughput imaging experiments have produced 
detailed cell taxonomies for mouse and hu- 
man brains (8-11), leading to a comprehensive 
view of cell types and their molecular signa- 
tures in many brain areas (12-14). Analyses 
of gene expression patterns using single-cell 
transcriptomics and spatial transcriptomics 
assays (8, 10, 15-17) have further advanced our 
understanding of the transcriptional land- 
scapes in different brain cell types. By com- 
parison, analysis of the regulatory elements 
that drive the cell-type-specific expression of 
genes is lagging. Current catalogs of candidate 
regulatory sequences in the human genome, 
most notably those generated by the ENCODE 
and Epigenome Roadmap consortia (6, 7, 18, 19), 
still lack the information about cell-type-specific 
activities of each element, especially those iden- 
tified from brain tissues, because conventional 
assays performed using bulk tissue samples 
unfortunately fail to resolve cis-regulatory DNA 


elements (cCREs) in individual cell types com- 
prising the heterogeneous tissues. Recent tech- 
nological advances have enabled the analysis of 
open chromatin at single-cell resolution (20-23) 
in adult mouse tissues (20, 22, 24), generating 
cell-type-specific maps of gene-regulatory ele- 
ments for a limited number of human brain 
cell types and brain regions (25, 26). 

As part of the BRAIN Initiative Cell Census 
Network (BICCN), we have performed single- 
cell profiling of the transcriptome, chromatin 
accessibility (CA), and the DNA methylome 
across >40 regions in the human brain from 
multiple neurotypical adult donors. Here, we 
generated a single-cell CA atlas comprising 
~1.1 million human brain cells. We used this 
chromatin atlas to define 107 distinct brain cell 
types and to uncover the state of CA at 544,735 
cCREs in one or more of these cell types. We 
found that nearly a third of the cCREs demon- 
strated conservation and CA in the mouse brain 
cells. We integrated our chromatin atlas with 
the single-cell transcriptome and DNA meth- 
ylome atlases to link cCREs to putative target 
genes. We further predicted disease-relevant 
cell types for 19 neuropsychiatric disorders. 
Finally, we developed machine learning mod- 
els to predict the regulatory function of dis- 
ease risk variants and created an interactive 
web atlas to disseminate this resource, which 
we named the “cis-element atlas” or “CATlas” 
(http://catlas.org). 


A single-cell CA atlas of human brains 


We dissected 42 brain regions from the hu- 
man cortex (CTX), hippocampus (HIP), basal 
nuclei (BN), midbrain (MB), thalamus (THM), 
cerebellum (CB), and pons (PN) [according 
to the Allen Brain Reference Atlas (27)] from 
three neurotypical male donors (D1, D2, and 
D4), ages 29, 42, and 58, respectively (Fig. 1A and 
table S1). For each brain sample, we performed 
snATAC-seq using a protocol described previ- 
ously (28) (Fig. 1A; fig. S1, A to D; and table S2). 
Data reliability was confirmed by sequencing 
reads showing nucleosome-like periodicity 
(fig. SIE), excellent correlation between data- 
sets from similar brain regions (fig. SIF), high 
enrichments of reads near transcription start 
sites (TSSs), and other quality control metrics 
(see the materials and methods). A total of 
1,290,974 nuclei passed a set of quality control 
thresholds (fig. S1G, and materials and methods). 
After removing an additional 156,614 snATAC- 
seq profiles that likely resulted from barcode 
collision or doublets (fig. S1, H to J, and mate- 
rials and methods), a total of 1,134,360 nuclei 
were retained. Among them, 595,713 were from 
CTX, 72,190 from HIP, 317,480 from BN, 23,114 
from MB, 50,768 from THM, 51,775 from CB, 
and 25,459 from PN (table S3). On average, 
4970 chromatin fragments were detected in 
each nucleus (table S3; fig. S1, K to M; and 
materials and methods). 
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colored by cell subclasses. (E) UMAP embedding of GABAergic neurons, colored __ relative contribution of brain regions to cell subclasses. Right, regional specificity 
by brain regions. (F) UMAP embedding and clustering analysis of non-neurons, scores of cell subclasses. 
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We hext performed iterative clustering with 
snATAC-seq profiles and classified them into 
three major classes, with class I enriched for 
glutamatergic (vGlut’, putatively excitatory) 
neurons (11.8%), class II enriched for GABAergic 
(GABA’, putatively inhibitory) neurons (6.8%), 
and class III enriched for non-neuronal cells 
(81.4%) (Fig. 1, B, D, and F; fig. S2; and fig. S3, 
A and B). Iterative clustering further classified 
the three major classes into 14 subclasses of 
vGlut™ neurons, two subclasses of granule cell 
types, one subclass of cholinergic neurons, four 
subclasses of dopaminergic neurons, two sub- 
classes of thalamic- and MB-derived neurons, 
11 subclasses of cortical GABA” neurons, and 
eight subclasses of non-neuronal cells (Fig. 1, 
B, D, and F). Each subclass was annotated 
based on CA at promoters and gene bodies of 
at least three marker genes of known brain cell 
types, together with the brain region where 
the cells reside (Fig. 1, C, E, and G; fig. S3C; and 
tables S4 and S5). For each subclass, we also 
conducted a third round of clustering and iden- 
tified a union list of 107 distinct cell types (Fig. 
1H, fig. S4, table S3, and materials and meth- 
ods). To determine the optimal number of cell 
types within each subclass, we evaluated the 
relative stability from a consensus matrix based 
on 100 rounds of clustering with randomized 
starting seeds. We then calculated the propor- 
tion of ambiguous clustering (PAC) score and 
dispersion coefficient (DC) to find the optimal 
resolution (local minimum and maximum) for 
cell type clustering (fig. S2, B to E). For exam- 
ple, vasoactive intestinal peptide-expressing 
(VIP*) neurons were further divided into mul- 
tiple cell types with distinct CA at multiple gene 
loci (Fig. 11; fig. S2, B to E; and fig. $4). We found 
that the clustering result of snATAC-seq was 
robust to variation of sequencing depth and 
signal-to-noise ratios, and most cell subclasses 
showed no batch effect from at least two biolog- 
ical replicates using local inverse Simpson’s 
index analysis, with the exception of two cell 
subclasses, granule cells from the subiculum 
and vascular smooth muscle cells, that were 
mostly captured from one donor (fig. S5). To 
capture the relative similarity in chromatin 
landscapes among the 42 subclasses, we con- 
structed a robust hierarchical dendrogram 
showing known organizing principles of hu- 
man brain cells. The non-neuronal class is sep- 
arated from the neuronal class, which were 
further separated based on neurotransmitter 
types (GABA", dopaminergic, cholinergic, and 
vGlut*) and developmental origins (Fig. 1H, 
fig. S6, and materials and methods). 

As expected, most neuronal cell types and 
some glial cell types were distributed in the 
human brain in a nonuniform fashion (Fig. 
1J). We defined a regional specificity score for 
each subclass based on the contribution from 
different brain regions. Although the majority 
of glial cell types were ubiquitously distributed 
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throughout the brain, showing very low re- 
gional specificity (Fig. 1J, right), there were 
exceptions. For example, Bergmann glia, also 
called Golgi epithelial cells, were specialized, 
unipolar glial cells featuring cell bodies situ- 
ated in the Purkinje cell layer and radial fibers 
in the CB (29). This cell type was specifically 
found in the CB (Fig. 1, H and J). Conversely, 
most neuronal types were characterized by 
regional specificity (Fig. 1J, right). We found 
a stark separation based on brain subregions 
for distinct neuron types, including the gran- 
ular cells in the CB and medium spiny neurons 
(MSNs) in the basal ganglia. For vGlut” neu- 
rons, we also observed that distinct types of 
intratelencephalic (IT) cortical neurons in the 
primary visual cortex (A1C) and excitatory neu- 
rons from the amygdala were highly restricted 
to specific brain regions or dissections. We 
also compared our cell clusters with the cell 
taxonomy defined from other modalities and 
attained a good level of agreement (fig. S7 and 
supplementary text). 


Mapping and characterization of human 
brain cCREs 


As a first step toward defining the gene-regulatory 
programs that underlie the identity and func- 
tion of each brain cell type, we identified the 
open chromatin and cCREs in each of the 107 
brain cell types. We aggregated the CA profiles 
from the nuclei comprising each cell cluster or 
type and identified the open chromatin re- 
gions with MACS2 (30) (fig. S8A). We filtered 
the resulting accessible chromatin regions based 
on whether they were called in at least two do- 
nors, or pseudoreplicates (fig. S8, A and B). 
From our previous study, we found that read 
depth or cluster size can affect MACS2 peak 
calling scores (28). Therefore, we used “score 
per million” (37) to correct this bias (fig. S8C 
and materials and methods). About 1000 nu- 
clei were sufficient to identify >80% of the ac- 
cessible regions in a cell type, consistent with 
our previous finding (28) (fig. S8C). We itera- 
tively merged the open chromatin regions iden- 
tified from every cell type and kept the summits 
with the highest MACS2 score for overlapped 
regions. On average, we detected 62,045 open 
chromatin regions per cell type (each 500 bp 
in length) and a union of 544,735 open chro- 
matin regions across all 107 cell types (fig. S8D, 
tables S6 and S7, and materials and methods). 
These cCREs together made up 8.8% of the 
human genome (hg38) (table S8). Of these 
cCREs, 95.3% were located at least 2 kbp away 
from annotated promoter regions of protein- 
coding and long noncoding RNA genes (Fig. 
2A and table S8). The promoter-distal cCREs 
were distributed in introns (34.8%), intergenic 
(27.8%), and other genomic regions. Twenty- 
two percent of them overlapped with endoge- 
nous retrotransposable elements, including 
the long terminal repeat (LTR) class (6.8%), 


the long interspersed nuclear element (LINE) 
class (11.3%), and the short interspersed nuclear 
element (SINE) class (3.9%). Several lines of 
evidence support the authenticity of the iden- 
tified cCREs. First, both proximal and distal 
cCREs showed higher sequence conservation 
than random genomic regions with similar 
GC content (Fig. 2B). Second, 89.6% of cCREs 
overlapped with DNase hypersensitive sites 
(DHSs) previously mapped in a broad spec- 
trum of bulk human tissues and cell types, 
including fetal and adult brains (32). This list 
further expands cCREs previously annotated in 
the human genome by the ENCODE consor- 
tium (19) and a recent survey of CA in single 
nuclei across fetal and adult human tissues 
(33) (Fig. 2C). 

To define the cell-type specificity of the cCREs, 
we first plotted the median levels of CA against 
the maximum variation for each element (Fig. 
2D). We found that most cCREs displayed high- 
ly variable CA across the brain cell types iden- 
tified in the current study, except for a small 
proportion of invariable cCREs (2.0%) that 
showed accessibility in virtually all cell clus- 
ters, of which 87% were at proximal regions 
to TSSs (Fig. 2D and fig. S8E). To characterize 
the cell-type specificity of the cCREs more ex- 
plicitly, we used non-negative matrix factor- 
ization to group them into 37 modules, with 
elements in each module sharing similar cell 
type specificity profiles. Except for the first 
module (M1) which included mostly cell-type- 
invariant cCREs, the remaining 36 modules 
displayed high cell-type-restricted accessibil- 
ity (Fig. 2E and tables S9 and S10). These re- 
stricted modules were enriched for distinct sets 
of motifs recognized by known transcriptional 
regulators (table S11). For example, NEUROG2 
and ASCL1, which were enriched in module M4 
for IT neurons at cortical layer 2/3 (IT-L2/3), 
have been reported to be proneural genes, which 
is critical for cortical development (table S11) 
(34). The SOX family factors in module M35 
for OGCs are pivotal regulators of a variety of 
developmental processes (table S11) (35). These 
results lay a foundation for dissecting the gene- 
regulatory programs in different brain cell types 
and regions. 


Linking distal cCREs to target genes 


To investigate the transcriptional regulatory pro- 
grams that are responsible for cell-type-specific 
gene expression patterns in the human brain, 
we performed an integrative analysis combin- 
ing the snATAC-seq data collected in the cur- 
rent study with scRNA-seq data reported in a 
companion paper (36) from matched brain 
regions (fig. S7). We first connected 255,828 
distal cCREs to 14,861 putative target genes by 
measuring the coaccessibility across single nu- 
clei in every cell subclass (Fig. 2F, top, and 
materials and methods), which resulted in a 
total of 1,661,975 gene-cCRE pairs within 500 kb 
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Fig. 2. Identification and characterization of cCREs across human brain 
cell types. (A) Pie chart showing the fraction of cCREs that overlaps with 
different classes of annotated sequences in the human genome. TTS, 
transcription termination site; UTR, untranslated region. (B) Average phastCons 
conservation scores of proximal (red) and distal cCREs (yellow) and random 
genomic background (gray). (©) Stacked bar plot showing the percentage of 
new cCREs defined in this study (red) and percentage of cCREs that overlapped 
with public recourse (gray), including the cCREs and DHSs in the SCREEN 
database, cCREs identify in human enhancer atlas (HEA) fetal and adult brain. 
(D) Density map comparing the median and maximum variation of CA at each 
cCRE across cell types. Each dot represents a cCRE. (E) Heatmap showing 
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association of the 42 subclasses (rows) with 37 cis-regulatory modules (top, 
from left to right). Columns represent cCREs. A full list of subclass or module 
associations is provided in table S9, and the association of cCREs to modules is 
provided in table S10. (F) Schematic overview of the computational strategy used to 
identify cCREs that were positively correlated with transcription of target genes. 
(G) In total, 265,049 pairs of positively correlated cCRE and genes (highlighted in 
red) were identified (FDR < 0.05). Gray filled curve shows distribution of PCC 

for randomly shuffled cCRE-gene pairs. (H and I) Heatmap showing CA of putative 
enhancers (H) and the expression of linked genes (I, right). Genes are shown for 
each putative enhancer separately. UMI, unique molecular identifier. (J) Enrichment 
of HOMER known TF motifs in distinct enhancer-gene modules. 
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of each other. Next, we identified the subset of 
cCREs the accessibility of which positively cor- 
related with the expression of putative target 
genes and therefore could function as puta- 
tive enhancers in neuronal or non-neuronal types 
(Fig. 2F, bottom). This analysis was restricted 
to distal cCREs and expressed genes captured 
from 27 matched cell subclasses defined from 
integrative analysis between snATAC-seq and 
scRNA-seq (fig. $7). We revealed a total of 
265,049 pairs of positively correlated cCRE 
(putative enhancers) and genes at an empirically 
defined significance threshold of false discovery 
rate (FDR) < 0.05 (table S12). These included 
114,877 putative enhancers and 13,094 genes 
(Fig. 2G, fig. S9, and table S12). The median 
distance between the putative enhancers and 
the target promoters was 176,345 bp (fig. SQA). 
Each promoter region was assigned to an aver- 
age of seven putative enhancers, and each puta- 
tive enhancer was assigned to two genes on 
average (fig. S9, B and C). 

To investigate how cell-type-specific gene 
expression is regulated, we further classified 
these putative enhancers into 27 modules by 
using non-negative matrix factorization (table 
S13, and S14). The putative enhancers in each 
module had a similar pattern of CA across cell 
subclasses (Fig. 2H), and the expression of 
putative target genes showed a correlated pat- 
tern (Fig. 21). This analysis revealed a large group 
of 5113 putative enhancers that were linked to 
4775 genes more strongly expressed across all 
neuronal cell clusters than in non-neuronal cell 
types (module M1) (Fig. 2, H and I, and tables 
S13 and S14). These putative enhancers are 
strongly enriched for CTCF- and RFX-binding 
sites (table S15), which is consistent with what 
we previously found in the mouse cerebrum (28). 

We also uncovered modules of enhancer- 
gene pairs that were active in a more restricted 
manner (modules M2 to M27) (Fig. 2, H to J). 
For example, we identified several modules (M2 
to M7) that were associated with cortical gluta- 
matergic neurons (IT-L2/3, IT-L4, IT-L5, IT-L6-1, 
and IT-L6-2), in which the putative enhancers 
were enriched for sequence motifs recognized 
by the bHLH factors NEUROD1 (Fig. 2J and 
tables S13 to S15). Another example was mod- 
ule M15 associated with MSNs, in which puta- 
tive enhancers were enriched for motifs of 
MEIS factors, which play an important role in 
establishing the striatal inhibitory neurons 
(Fig. 2J and tables S13 to S15). Module M25 
was associated with microglia (MGCs). Genes 
linked to putative enhancers in this module 
were related to immune genes and the puta- 
tive enhancers were enriched for the binding 
motif for ETS-factor PU.1, a known master 
transcriptional regulator of this cell lineage 
(Fig. 2J and tables S13 to S15). This observation 
is consistent with the paradigm that cell-type- 
specific gene expression patterns are largely 
established by distal enhancer elements. 
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Regional specificity of glial and neuronal 

cell cCREs 

The single-cell atlas of CA generated in this 
study provides an excellent opportunity to char- 
acterize the heterogeneity of the gene regulatory 
programs that might underlie the specialized 
functions of glial and neuronal cells in each brain 
region. Whereas most non-neuronal cell types, 
including OGCs, OPCs, MGCs, telencephalon 
astrocytes (ASCTs), non-telencephalon (ASCNTs), 
and various vascular cells were ubiquitously 
distributed throughout the different brain 
dissections (Fig. 1J), molecular diversity has 
been recently reported in these cells in juvenile 
and adult vertebrates (37-40). We therefore 
leverage the largest collection of >900,000 sin- 
gle nuclei of non-neuronal cells and the high- 
resolution brain dissections in the present 
study (fig. SIN) to perform in-depth character- 
ization of the regulatory diversity of the non- 
neuronal populations. 

The Uniform Manifold Approximation and 
Projection (UMAP) embeddings in the brain 
regional spaces showed a gradient among cell 
types of OGCs, OPCs, MGCs, and ASCTs (Fig. 3, 
A and B, E and F, I and J, and M and N, and 
fig. S10). We hypothesized that these gradients 
may reflect heterogeneity in cCRE usage in 
these glial cells across brain regions. We first 
calculated the averaged CA and coefficient of 
variation (CV) across 42 brain regions for every 
cCRE identified in OGCs, OPCs, MGCs, ASCTs, 
respectively (Fig. 3, C, G, K, and O, and fig. S10). 
A large number of cCREs displayed highly var- 
iable CA across the brain regions (Fig. 3, C, G, K, 
and O, and fig. S10). In total, 55,304 variable 
cCREs made up 40.1% of the total cCREs iden- 
tified in OGCs, 43,574 variable cCREs made up 
33.0% of the total cCREs identified in OPCs, 
37,962 variable cCREs made up 34.5% of the 
total cCREs identified in MGCs, and 46,979 var- 
iable cCREs made up 33.1% of the total cCREs 
identified in ASCTs. Next, using k-means clus- 
tering analysis on these variable cCREs for each 
glial cell population (Fig. 3, D, H, L, and P), we 
revealed distinct open chromatin patterns in 
OGCs, OPCs, and ASCTs from the CB. A large 
fraction of these variable CCREs showed higher 
CA in the CB exclusively (Fig. 3, D, H, and P). We 
also observed loss of CA in a large number of 
cCREs in distinct brain structures (Fig. 3, D, H, 
L, and P). These variable cCREs show similar 
regional specificities across three donors. 

In addition, we found a diverse population of 
both ASCT and ASCNT cells in different major 
brain structures (Fig. 3, M and N). We identi- 
fied three ASCNT cell types from subclustering 
of ASCs and one cell population restricted to 
the CB that was annotated as Bergmann glial 
cell (ASBGM) (Fig. 3, Q and R). One cell type 
(ASCNT-1) was detected mostly in the THM, 
MB, and PN, whereas the other two ASCNT cell 
types were predominantly found in the CTX, 
HIP, and cerebral nuclei. To characterize the 


dynamic epigenome, we compared the open 
chromatin landscapes among different cell 
types using a likelihood ratio test (Fig. 3S, 
table S16, and materials and methods), and 
identified a total number of 8,790 cCREs that 
exhibited cell-type-restricted accessibility (range: 
100 to 3,787) (Fig. 3S). A human enhancer that 
was specifically accessible in the ASCNT-1 type 
was previously validated by mouse transgenics 
to be active in the MB (Fig. 3T). We further 
performed motif analysis for differentially 
accessible regions in these cell types, finding 
enrichment of both shared and specific tran- 
scription factor (TF)-binding motifs. For exam- 
ple, we found CCAAT box-binding transcription 
factor NF1 enriched in differential regions iden- 
tified from both ASCT-1 and ASBGM, whereas 
TF motifs from nuclear receptors and zinc-finger 
families were specifically enriched in different 
types (Fig. 3U and table S17). 

We found that the cell populations of MSNs 
resolved by CA were better separated based on 
the subregions in basal ganglia, rather than D1 
and D2 types defined by the expression of two 
dopamine receptors, DRD1 and DRD2 (fig. S11, 
tables S18 and S19, and supplementary text). 


Epigenetic conservation of cCREs in mouse 
and human brain cells 


To determine how conserved the gene-regulatory 
landscapes are between human and mouse 
brains, we compared the human brain cCREs 
defined in this study with our previously pub- 
lished map of mouse cerebrum cCREs (28). We 
first performed joint clustering of 18 neuronal 
and glial cell subclasses from cerebrum (each 
with >1000 single nuclei) (Fig. 4A and fig. S12) 
(28). We used multiple molecular features, 
including “gene activity scores” at homologous 
genes, CA at homologous cCREs, and TF motif 
enrichment scores (fig. S12, A to G, and ma- 
terials and methods). Clustering based on gene 
activity scores alone does not align brain cell 
types between the two species (fig. S12, A, D, 
and E) because of a lack of general conserva- 
tion of expression patterns, as reported previ- 
ously (41, 42). We identified orthologs of the 
human cCREs in the mouse genome by per- 
forming reciprocal homology searches, which 
overlapped with cCREs identified in the mouse 
cerebrum (table S20 and materials and meth- 
ods). Clustering using the CA at homologous 
cCREs alone also failed to align corresponding 
cell types, likely due to substantial of CRE turn- 
overs (fig. S12, B and F). Instead, we found that 
clustering based on TF motif enrichment al- 
lowed for reasonable alignment of brain sub- 
classes between species (fig. S12, C and G). This 
observation suggests that sequence motif en- 
richment scores are conserved molecular features 
that can reliably align similar cell subclasses in 
the human and mouse brains (Fig. 4B and fig. 
S12G). These analyses also showed that the gene- 
regulatory programs of similar cell types share 
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Fig. 3. Regional specificity of A 
cell types correlates with CA. 
(A) UMAP embedding of cell types 
of OGCs. (B) UMAP embedding of 
OGCs colored by major brain 
structures. CN, cerebral nuclei. 
(C) Density scatter plot comparing 
the averaged accessibility and 
coefficient of variation across 
brain structures at each cCRE. 
Variable cCREs for OGCs are 
defined on the right side of dashed 
line. (D) Heatmap showing the 
normalized accessibility of variable 
cCREs in OGCs across major brain 
structures. (E) UMAP embedding 
of cell types of OPCs. (F) UMAP 
embedding of OPCs colored by 
major brain structures. (G) Density 


scatter plot comparing the averaged umap-1 n = 68,786 rf) 
accessibility and coefficient of J . 
variation across brain structures DIO Dy. CeNiype COIOE Dy OteirStnuet. 


at each cCRE. Variable cCREs for 
OPCs are defined on the right 
Side of dashed line. (H) Heatmap 
showing the normalized accessibility 
of variable cCREs in OPCs. (I) UMAP 
embedding of cell types of MGCs. 
(J) UMAP embedding of MGCs 
colored by major brain structures. 
(K) Density scatter plot comparing 
the averaged accessibility and M 
coefficient of variation across brain 
structures at each cCRE. Variable 
cCREs for MGCs are defined 

on the right side of dashed line. 
(L) Heatmap showing the normalized 
accessibility of variable cCREs in 
MGC. (M) UMAP embedding of cell 
types of ASCs. (N) UMAP embedding 
of ASCs colored by major brain 
structures. (O) Density scatter plot 
comparing the averaged accessibility 
and coefficient of variation across 
brain structures at each cCRE. 
Variable cCREs for ASCs are defined 
on the right side of dashed line. 

(P) Heatmap showing the normalized 
accessibility of variable cCREs 

in ASCs. (Q) UMAP embedding of 
ASCNTs. (R) UMAP of ASCNTs, 
colored by major brain structures. 
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(S) Normalized CA of 8,790 cell-type-specific cCREs. (T) Representative images of transgenic mouse embryos showing LacZ reporter gene expression under the control of 
the indicated enhancers that overlapped the differential cCRE in (S) (dotted line). Images were downloaded from the VISTA database (https://enhancer.lbl.gov). (U) Top 


enriched known motifs for ASC cell-type—-specific cCREs. 


a similar grammar and syntax of gene regula- 
tion, likely in the form of combinatorial activi- 
ties of conserved TFs. 

For ~60% of the human cCREs, mouse ge- 
nome sequences with high similarity could be 
identified (>50% of bases lifted over to the hu- 
man genome) (Fig. 4C; fig. S12, H and J; and 
table S20). Among these orthologs’ genome 
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sequences, only half of them (32.8% of total 
human cCREs) were also identified as open 
chromatin regions in any cell subclasses from 
the mouse cerebrum. We thus defined the 32.8% 
of human cCREs with both DNA sequence 
similarity and open chromatin conservation 
as CA-conserved cCREs, and 26.8% of human 
cCREs with only DNA sequence similarity as 


CA-divergent cCREs. In addition, we defined 
40.4% of human cCREs without orthologous 
genome sequences in the mouse genome as 
human-specific cCREs, although these may 
also include cCREs conserved in other primates 
or mammalian species except for the mouse 
(Fig. 4C, left, and table S21). This general pat- 
tern was consistent with what has been reported 
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Fig. 4. Comparative analyses of CA between human and mouse cerebrum. 
(A) UMAP coembedding of 18 cell subclasses from both human and mouse 
cerebrum. (B) UMAP coembedding of single nuclei colored by human and mouse. 
(C) Left, pie chart showing fraction of three categories of cCREs, including 
human-specific, CA-divergent, and CA-conserved cCREs. The CA-conserved 
cCREs are both DNA sequence conserved across species and have open 
chromatin in orthologous regions. The CA divergent cCREs are sequence 
conserved to orthologous regions but have not been identified as open chromatin 
regions in other species. Human-specific cCREs did not have orthologous regions 


in the mouse genome. Right, bar plot showing three categories of cCREs in 
corresponding cell subclasses from human and mouse. (D) Dot plot showing 
fraction of genomic distribution of three categories of cCREs. (E) Normalized 
accessibility at variable human-specific TEs in different cell subclasses. RPKM, 
reads per kilobase per million. (F) Average CA of LTR13A in microglia across 
different brain regions. (G) Variable CA of LTR13A across donors in microglia 
from different brain regions. (H) Invariable CA of LTR13A across donors in microglia 
from different brain regions. (I) Representative genomic locus showing CA and 
expression of LTE13A in microglia. 


in other cell types between human and mouse 
(43). Next, we further broke down and per- 
formed the same analyses on the cCREs within 
the corresponding cell subclasses from both 
species. We observed a similar pattern, as the 
proportion of different categories of CCREs was 
relatively consistent between various cell sub- 
classes (Fig. 4C, right, and table S21). 

We further characterized the genomic dis- 
tribution of different categories of cCREs. We 
observed that a large proportion of CA-conserved 
cCREs were located at or near the promoter- 
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TSS regions in the human genome. In addition, 
the human-specific cCREs were enriched for 
transposable elements (TEs), such as LINEs, 
SINEs, and LTRs (Fig. 4D and table S21). Pre- 
vious reports suggest that certain transposable 
elements are active in mammalian brains, and 
could hypothetically contribute to vulnerabil- 
ity to disease (44). In support of this hypoth- 
esis, we further characterized TE-cCREs (fig. 
S13 and supplementary text), and our find- 
ings provide evidence that distinct TE fami- 
lies might be activated in specific brain cell 


types. For example, LTRs, including but not 
limited to LTR13A, LTR2B, and LTR5B, display 
CA in microglia but not in other subclasses of 
brain cells (Fig. 4E). The LTR13A has been re- 
ported to act as a cellular gene enhancer (45). 
We observed that the LTR13A also has variable 
accessibility in microglia populations in differ- 
ent brain regions. For example, we identified 
higher accessibility in brain regions such as 
posterior parahippocampal gyrus (TH-TL), A1C 
and primary motor cortex (MIC) from the CTX, 
substantia innominata and nearby nuclei (SI) 
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and corticomedial nuclear group from cere- 
bral nuclei, and lower accessibility in MB and 
CB (Fig. 4F). Furthermore, chromatin accessi- 
bilities at LTR13A in microglia from TH-TL, 
corticomedial nuclear group, and SI varied 
considerably among the donors, (Fig. 4G), but 
not in other brain regions (Fig. 4H). The CA 
of LTR13A was associated with the activation 
indicated by RNA expression signals (Fig. 41), 
although the biological relevance of this ob- 
servation requires further investigation in larger 
cohorts. 

Taking advantage of the integration of multi- 
modal datasets, we next aimed to better deline- 
ate the gene-regulatory programs that underlie 
the identity and function of each brain cell type 
(Fig. 5A and fig. S14, A to E). We collected single- 
cell genomic datasets profiled from the MIC 
and middle temporal gyrus (MTG), including 
149,891 cells from scRNA-seq, 27,383 cells from 
Paired-Tag sequencing (only from MIC), 55,974 
cells from snATAC-seq, 10,604 cells from snmC- 
seq, and 16,257 cells from snm3C-seq (Fig. 5, B and 
C). We performed coembedding cell-clustering 
analysis on these datasets (see the supplementary 
text and materals and methods). The different 
single-cell assays showed excellent agreement 
in the same coembedding space, which indicates 
the high quality of common variable features 
and the success of the integration strategy (Fig. 
5C). The integration of multimodal datasets al- 
lowed us to evaluate the information content, 
strengths, and limitations of various assays 
in the prediction of potential functional en- 
hancers. We first defined different subsets of 
distal cCREs by using the combination of single- 
cell modalities or snATAC-seq only (Fig. 5D). 
By comparing the different subsets of distal 
cCREs against validated human enhancer in 
the forebrain from VISTA Enhancer Browser 
(https://enhancer.lbl.gov) (46), we observed 
that the highest gain of enrichment comes with 
the incorporation of H3K27ac modification sig- 
nals (Fig. 5D). We also found incorparing the 
sequence conservation information further 
improved the prediction of potential functional 
enhancers. The above observations suggest 
that incorporation of CA, histone modifica- 
tion information such as H3K27ac, coaccessi- 
bility between distal elements and promoters, 
and sequence conservation could improve the 
prediction of functional enhancers. 

We characterized the gene program in VIP-4, 
one cell type of VIP* neurons that showed dis- 
tinct CA at gene CHRNA2 loci (Fig. 5E). The 
gene CHRNA2 encodes a subunit of nicotinic 
cholinergic receptor, which is involved in fast 
synaptic transmission. The colocalization of 
marker gene BTBDI11 for VIP* neurons and 
CHRNA2 from RNAscope (47) in situ hybrid- 
ization experiment first validated the exis- 
tence of VIP-4 type in the human CTX (fig. 
S14F). We also noticed that the expression of 
CHRNA2 was restricted in human VIP” cell 
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types, but not in any VIP” cell types identified 
from mouse brain (Fig. 5F) (8, 42, 48). We ex- 
plored whether the human-specific expression 
of CHRNA2 was regulated by specific cCREs in 
the VIP-4 type. We identified a total of 40,086 
differential cCREs between seven VIP* cell types 
(Fig. 5G and table S22). One differential cCRE 
located downstream of the gene CHRNA2 
showed higher accessibility in VIP-4, than in 
OPC (Fig. 5H, ATAC tracks for seven VIP” cell 
types and aggregated signals for VIP and OPC). 
This cCRE was also characterized as a human- 
specific cCRE in VIP” neurons (Fig. 4C and table 
S20). The specific accessibility of this cCRE and 
promoter of CHRNA2 in VIP” neurons was 
supported by mCG signals from snmC-seq (Fig. 
5H, mCG tracks). This cCRE was predicted as a 
putative enhancer that regulates the expression 
of CHRNA2 (Fig. 5H, magenta arcs). The poten- 
tial active function of this CRE was further 
supported by H3K27ac modification in the 
cells (Fig. 5H, K27ac tracks). We additionally 
confirmed the chromatin interactions between 
this cCRE and the promoter of CHRNA2 (Fig. 
51). Taken together, the above data suggested 
that this human-specific cCRE could be an 
enhancer that regulates the distinct expres- 
sion of CHRNA2 in the VIP-4 type from the hu- 
man brain. 


Sequence changes underlie epigenetic 
divergence in cCREs in distinct brain cell types 


We characterized different categories of cCCREs 
(figs. S15 to S17, table S23, and supplementary 
text) and hypothesized that the epigenetic di- 
vergence of cCREs could be partly due to evo- 
lutionary changes in DNA sequences. To test 
this hypothesis, we picked IT-L2/3 neurons, 
LAMPS" interneurons, and MGC as represen- 
tative cell subclasses, and trained gapped-kmer 
SVM (gkmSVM) classifiers (49) from the DNA 
sequences in cCREs (Fig. 6, A and B, and ma- 
terials and methods). These models achieved 
excellent performance [area under receiver Op- 
erating characteristic curve (AUROC) ranging 
from 0.856 to 0.928, and area under precision- 
recall curve (AUPRC) ranging from 0.850 to 
0.912] in the prediction of open chromatin re- 
gions within the corresponding species (Fig. 6, 
A and B). Next, we predicted different catego- 
ries of mouse cCREs using gkmSVM models 
trained with human DNA sequence at cCREs 
in corresponding cell subclasses (Fig. 6C). These 
models also achieved high accuracy (ranging 
from 0.83 to 0.91) in the prediction of CA- 
conserved mouse cCREs and slightly lower 
accuracy in CA-divergent (ranging from 0.79 to 
0.82) and mouse-species CCREs (ranging from 
0.71 to 0.80) (Fig. 6C). Similarly, the gkmSVM 
models trained with mouse DNA sequences 
archived high accuracy in predicting human 
CA-conserved cCREs (ranging from 0.89 to 
0.93), and slightly lower accuracy for human 
CA-divergent (ranging from 0.75 to 0.86) and 
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human-specific cCREs (ranging from 0.77 to 
0.84) (Fig. 6D). The human CA-divergent cCREs 
that failed to be predicted from mouse gkmSVM 
models have a potential function in regulating 
genes involved in specific biological processes, 
including glutamate receptor signaling path- 
way [Gene Ontology (GO) no. 000721], synaptic 
transmission - GABAergic (GO no. 0051932), and 
various fatty acid elongation (GO nos. 0019367, 
0019368, and 0034625) in IT-L2/3 neurons, 
LAMP5* interneurons, and MGC, respectively 
(fig. S18, table S24, and supplementary text). 
These results suggested that the regulatory di- 
vergence is at least in part due to the evolution 
of DNA sequences. 


Predicting disease-relevant cell types 
for neuropschiatric disorders 


Genome-wide association studies (GWASs) have 
identified genetic variants that are associated 
with many mental diseases and traits (table 
$25), but >90% of variants are located in non- 
protein-coding regions of the genome (4, 35). 
Previous studies have shown that noncoding 
risk variants are enriched in cCREs active in 
disease-relevant cell types (6, 7, 50). Leveraging 
the newly annotated cell-type-resolved human 
brain cCREs, we predicted the cell types rele- 
vant to the different neuropsychiatric disor- 
ders. We performed linkage disequilibrium 
score regression (LDSC) analysis to determine 
whether the genetic heritability of DNA var- 
jants associated with neuropsychiatric disor- 
ders was significantly enriched within cCREs 
showing CA in the major brain cell types in the 
present study (table S25 and materials and me- 
thods). We found significant associations between 
19 mental diseases and traits (tables S25 and S26) 
with the open chromatin landscapes in one or 
more cell types that we identified (Fig. 7A and 
materials and methods) and few associations 
for non-central nervous system traits (fig. SI9A). 
In particular, we observed widespread and 
strong enrichment of genetic variants linked 
to neuropsychiatric disorders such as schizo- 
phrenia and bipolar disorder within acces- 
sible cCREs across various neuronal cell types 
(Fig. 7A and table S26). Tobacco use disorder 
and alcohol usage were associated with spe- 
cific neuronal cell types in basal ganglia, which 
were previously implicated in addiction (57). 
Another example is neuroticism, which was 
restricted and associated with IT neurons from 
the CTX (Fig. 7A and table $26). In addition, 
the risk variants from AD were significantly 
enriched in the cCREs found in microglia, but 
not in other cell types (Fig. 7A and table S26). 

We further provided breakout reports of 
LDSC analysis by using three categories of 
cCREs defined above (Fig. 4C). The strongest 
associations between cell subclasses and GWAS 
traits were found in the analysis of epigenetic 
conserved elements (Fig. 7B and table S27). For 
example, the risk variants in schizophrenia 
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Fig. 5. Integration of multimodal single-cell datasets of cortical cells. 

(A) Summary of single-cell technologies and multimodal data integration strategies. 
(B) UMAP embedding and integrative clustering analysis of 18 major cell 

types. (C) Coembedding of multimodal single-cell datasets showing excellent 
agreement. Asterisks indicate assays using an unbiased sampling strategy. 

(D) UpSet plot showing the enrichment of VISTA validated enhancer in different 
subsets of distal cCREs, which is defined by combining information and 
features from single-cell modalities or snATAC-seq only. These subsets include: 
(i) cCREs identified from snATAC-seq only; (ii) snATAC-seq cCREs overlapped 
with differentially methylated regions (DMRs) identified from snmC-seq; 

(iii) snATAC-seq distal cCREs that were predicted to be coaccessible with 
promoter across cells; (iv) snATAC-seq distal CCREs marked by H3K27ac signals 
from Paired-Tag, a method for joint single-cell analysis of histone modification 
and gene expression; and (v) snATAC-seq distal cCREs predicted to be coaccessible 
with promoters, and linked by chromatin loops identified in snm3C-seq assays. 
Then, we filtered out validated human enhancers in the forebrain from VISTA 


Li et al., Science 382, eadf7044 (2023) 13 October 2023 


snATAC-seq* 


Paired-Tag* 
€*, 


scRNA-seq snmC-seq snm3C-seq 


5e-4 Ma Se - 
snm3C-seq norm. strength 
OPC 


<“> Contact loop 


Norm. acc (z-score) ro (EEE) © 


bok OR 


r 


ee 


human specific 
cCREs 


enhancer browser (https://enhancer.lbl.gov). By overlapping different subsets 
of distal cCREs, we observed various enrichment (odds ratios from Fisher's 
exact test) with combinations of different assays and features. *P < 0.05, ***P < 
0.001, Fisher’s exact test. (E) Left, UMAP embedding of VIP” GABAergic cell 
types. Upper right is colored by donors. Bottom right shows normalized 
accessibility at gene CHRNA2. (F) Expression of gene VIP and CHRNAZ2 in 
human VIP* cell types and expression of gene Vip and Chrna2 in mouse VIP” cell 
types from the Allen Cell Types Database (RNA-seq data). (G) Normalized 

CA of 40,086 VIP” cell-type-specific cCREs. (H) Genome browser track view at 
the CHRNA2 locus as an example for candidate enhancers predicted from 
single-cell multimodel datasets. Shown are CA profiles from snATAC-seq, DNA 
methylation signals (mCG) from snm3C-seq, and histone modification signals 
(H3K27ac) from Paired-Tag for several VIP* neurons and OPC. Magenta Arcs 
represent the predicted enhancer for gene CHRNAZ. (I) Triangle heatmap 
showing chromatin contacts in VIP* neurons and OPCs derived from snm3C-seq 
data at the CHRNA2 locus. 
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Fig. 6. Epigenetic conservation and divergence of human orthologous 
cCREs. (A) Receiver operating characteristic (ROC) curve and area under curve 
(AUC) from gkmsvm models trained for representative human and mouse cell 
types. (B) Precision-recall curve (PRC) curve and AUC from gkmsvm models 
trained for representative human and mouse cell types. (C) Prediction for 


showed the most significant enrichment in 
epigenetic conserved elements (Fig. 7C and 
supplementary text). Fewer associations were 
observed in the analysis of epigenetic diver- 
gent elements, and most GWAS traits showed 
no associations when human-specific elements 
were used for LDSC (Fig. 7B). 

LDSC analysis using human-specific elements 
revealed an association between AD and microg- 
lia (Fig. 7B and table S27), raising the possibility 
that the AD-related risk variants could reside 
in human-specific regulatory elements and 
contribute to human-specific gene regulation 
programs in microglia (52) (fig. S20 and sup- 
plementary text). This observation suggests 
potential limitations of animal models of AD 
in revealing disease pathology in humans (53). 
For example, one AD risk locus contains mul- 
tiple microglia-specific cCREs, with no homol- 
ogous sequence in the mouse genome (fig. 
S19B). One of these cCREs harboring AD-risk 
variants rs6733839 has been predicted to be a 
microglia-specific enhancer that can regulate 
the expression of BINI gene, and its function 
was supported by both H3K27ac modifica- 
tion and a previous validation experiment (fig. 
S19B) (54). 


Deep learning models predict the influence 
of risk variants on gene regulation 


To further understand how risk variants con- 
tribute to the function of regulatory elements, 
we used deep learning models to predict CA 
from DNA sequences (Fig. 7E; fig. S21, A to C; 
and materials and methods). The deep learning 
model architecture was inspired by Enformer 
(55), which adapts attention-based architecture, 
Transformer, that could better capture syntactic 
forms (e.g., the order and combination of words 
in a sentence) and outperforms most existing 
models in natural language-processing tasks 
(56). We trained the deep learning model called 
Epiformer on the normalized pseudobulk ATAC- 
seq profiles in human microglia and multiple 
cell subclasses. To demonstrate the utility of 
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the resulting deep learning models, we focused 
on a microglia-specific cCRE that was predicted 
to regulate the expression of TSPAN/4 gene. 
This cCRE harbors two AD risk variants, the 
functions of which were investigated in a sep- 
arate study (57) (Fig. 7D). The deep learning 
model successfully predicted the cell-type- 
specific accessibility of cCREs at the 7JSPANI4 
locus with a Pearson correlation coefficient 
(PCC) of 0.72 (Fig. 7, E and F). By contrast, deep 
learning models trained using ATAC-seq profiles 
from other cell types failed to predict the CA 
profiles of these cCREs (fig. S21D). To predict 
the regulatory effects of the risk variant, we 
then performed in silico mutagenesis on the 
above microglia-specific enhancer near TSPANI4 
and compared the changes of accessibility pre- 
dicted from reference and altered DNA se- 
quences. Every nucleotide within this 500-bp 
enhancer was mutated zn silico, and the influ- 
ence on accessibility was measured by assessing 
the difference between the predicted accessibil- 
ity for the reference and altered sequences (Fig. 
7G and table S28). Among these in silico single- 
nucleotide mutations, most in the flanking 
regions did not affect the predicted accessi- 
bility; however, a few nucleotide substitutions 
increased or decreased the predicted CA (Fig. 
7G and table S28). The DNA sequence most 
negatively associated with the predicted ac- 
cessibility was predicted to contain binding 
motifs for TFs TFAP2A/B/C, ELF1, and FOXO1 
(Fig. 7G). Among them, FOXO1 has been re- 
cently reported to be a critical element for the 
regulation of microglial cell physiology and the 
maintenance of the brain homeostasis (58). 
The model predicted that the nucleotide sub- 
stitutions (C > A) of AD risk variant rs7922621 
would decrease the accessibility of the microglia- 
specific enhancer (Fig. 7H), whereas the AD 
risk variant rs7910643 would barely influence 
CA (Fig. 71). These predictions matched the ex- 
perimental results obtained from microglia-like 
cells differentiated from two different human 
pluripotent stem cell lines in which the two 
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mouse epigenetic conserved, mouse CA-divergent, and mouse-specific 

cCREs from gkmsvm models trained in corresponding human cell subclasses. 
(D) Prediction for human epigenetic conserved, human CA-divergent, and 
human-specific cCREs from gkmsvm models trained in corresponding mouse 


variants were modified using PRIME editing 
and tested for effects on JSPANI4 expression 
(57). These results provide evidence that deep 
learning approaches might be able to capture 
the gene-regulatory code and interpret risk 
variants associated with complex traits and 
diseases. 


Discussion 


In-depth knowledge of the transcriptional reg- 
ulatory program in brain cells would not only 
improve our understanding of the molecular 
inner workings of neurons and non-neuronal 
cells, but could also shed light on the patho- 
genesis of a spectrum of neurological disorders. 
Here, we report a comprehensive profiling of 
CA at single-cell resolution in 42 human brain 
regions. The CA of 544,735 cCREs was deter- 
mined in >1.1 million nuclei. Taking advan- 
tage of our high-resolution brain dissections, 
we examined the regional specificity in CA of 
cell types in the human brain and showed that 
most exhibited strong regional specificity. The 
described cCRE atlas (http://catlas.org) repre- 
sents a rich resource for the neuroscience com- 
munity to understand the molecular patterns 
that underlie the diversification of brain cell 
types as a complement to other molecular and 
anatomical data. 

The comparsion of open chromatin land- 
scape between human and mouse cell types 
uncovered a substantial degree of evolutionary 
changes involving both sequence turnovers and 
regulatory divergence. We identified ~30% of 
the cCREs that displayed sequence conserva- 
tion as well as CA, which is likely an underesti- 
mate of the degree of conservation, because the 
list of brain cCREs in each species will likely 
be greater as more cells and brain regions are 
assayed. We observed that the human-specific 
cCREs were enriched for TEs. Whether these 
TE-cCREs may serve as new enhancers to drive 
primate specific gene expression remains to be 
demonstrated (59). In addition, TEs are reac- 
tivated during aging, neurodegeneration, and 
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Fig. 7. Interpreting noncoding risk variants of neurological disorder and 
traits. (A) Heatmap showing enrichment of risk variants associated with 
neurological disorders and traits from GWASs in human cell-type—-resolved cCREs. 
Cell-type—-specific LDSC analysis was performed using GWAS summary statistics. 
Total cCREs identified independently from each human cell type were used as input 
for analysis. P values were corrected using the Benjamini-Hochberg procedure 

for multiple tests. FDRs of LDSC coefficients are shown. *FDR < 0.05; **FDR < 0.01: 
*** EDR < 0.001. Detailed results are shown in table S26. (B) Heatmap showing 
enrichment of risk variants associated with mental disorders and traits in three 
categories of cCREs. Detailed results are shown in table S27. (C) Fine mapping and 
molecular characterization of schizophrenia (SCZ) risk variants in different 
categories of cCREs from multiple neuronal types. Genome browser tracks 
(GRCh38) display CA profiles from snATAC-seg, and histone modification signals 
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(H3K27ac) from Paired-Tag, and magenta arcs represent the predicted enhancer for 
gene TSNARE. (D) Molecular characterization of AD risk variants in microglia specific 
enhancer. Genome browser tracks (GRCh38) display CA profiles from snATAC-seq 
and histone modification signals (H3K27ac) from Paired-Tag, and magenta arcs represent 
the predicted enhancer for gene TSPAN14. (E) Schematic diagram of deep 
learning model for predicting chromatin accessibly. (F) CA at TSPAN14 enhancer 
loci predicted in human microglia. (G) In silico nucleotide mutagenesis 
influenced the prediction of accessibility. Larger signals (dark red) represent a 
higher accessibility prediction on altered sequence, and lower signals (dark blue) 
represent lower accessibility on altered sequence. Predicted JASPAR CROE 2022 
motifs are listed at the bottom. (H) Lower accessibility predicted on TSPAN14 
enhancer with risk variant rs7922621 C > A. (I) Less accessibility change 
predicted on TSPAN14 enhancer with risk variant rs7910643 G > A. 
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neuropsychiatric disorders, and their role in the 
disease pathology needs to be further elucidated 
(60) as future datasets are collected from more 
species and donors in various conditions. 

GWASs have been widely used to enhance 
our understanding of polygenic human traits 
and to reveal clinically relevant therapeutic tar- 
gets for neuropsychiatric disorders. However, 
our ability to interpret the risk variants has 
been hampered by tissue heterogeneity and in- 
complete functional annotation of noncoding 
regulatory elements. By leveraging epigenetic 
conserved, divergent, and human-specific cCREs 
identified from various cell types and compared 
between human and mouse, we prioritized 
likely causal variants in linkage disequilibrium, 
linked distal cCREs to putative target genes, 
and predicted motifs altered by risk variants 
using cutting-edge deep learning methods. We 
revealed hundreds of cell-type trait associa- 
tions and created a framework to systematically 
interpret noncoding risk variants. 

The present study is limited to three subjects, 
and each brain region was surveyed at a mod- 
est depth. Future studies will be necessary to 
investigate the variations more deeply in chro- 
matin landscapes across different individuals, 
genders, age groups, and populations. Further, 
application of single-cell multiomics and spa- 
tial transcriptomics tools will greatly acceler- 
ate the identification of rare human brain cell 
types and their gene-regulatory programs. 


Materials and methods 
Tissue preparation and nuclei isolation 


Tissues from three human donors (males aged 
29, 42, and 58, right hemisphere) with post 
mortem interval <12 hours and RIN score >7.5 
were obtained from the Allen Institute. Tissue 
included 42 regions spanning diverse brain 
structures, dissected using standard architec- 
tonic landmarks and guided by the Allen Brain 
Human Brain Atlas (27). Tissue blocks were 
separated into 100- to 150-mg pieces in RNase- 
free glass dishes on dry ice and stored at -80°C 
until they were homogenized. 

We performed standard library prep on 108 
samples (79%), in which we harvested clean 
cell nuclei (fig. S22A and table S2). Frozen tis- 
sue was homogenized in 4.5 ml of lysis buffer 
(0.32 M sucrose, 0.1% Triton X-100, 5 mM CaCly, 
3 mM Meg(Ace), 0.1 mM EDTA pH 8.0, 10 mM 
Tris-HCl, pH 8.0, 1 mM DTT, Roche cOmplete 
Mini Protease Inhibitor, and 24 U/ml Promega 
RNasin Plus) using glass Dounce homogenizers. 
After sequencing, a median of 9535 single nuclei 
per sample passed the quality control (fig. S22B). 

For a subset of samples in subcortical sam- 
ples and HIP (7 = 29), we performed iodixanol- 
gradient purification, due to presence of a large 
amount of debris (fig. S22C and table $2), which 
may affect the cutting efficiency of Tn5, and 
result in low data quality (fig. S22D). For iodixanol- 
gradient purification of nuclei, tissue was Dounce 
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homogenized in 5 ml NIMT buffer (0.25 M 
sucrose, 0.1% Triton X-100, 25 mM KCl, 5 mM 
MgegCly, 10 mM Tris-Cl pH 8.0, 1 mM DTT, 20 U/ml 
SUPERase IN, 40 U/ml RNAse OUT), mixed 
with 50% iodixanol buffer (50% iodixanol 
(Optiprep), 25 mM KCl, 5 mM MgCl, 20 mM 
Tris-HCl pH 8.0) to a concentration of 20% 
iodixanol, and layered onto 25% iodixanol buffer 
(25% iodixanol (Optiprep), 0.125 M sucrose, 
12.5 mM KCl, 2.5 mM MgCls, 5 mM Tris-Cl pH 
8.0). Samples were centrifuged at 10,000g for 
20 min at 4°C in a Beckman Optima-80K Ultra- 
centrifuge with SW-41-Ti swinging bucket rotor. 
The supernatant was carefully aspirated before 
nuclei rehydration. All steps were performed on 
ice using nuclease-free equipment. The iodixanol- 
gradient purification would clean up the debris, 
and allowed us to get significant better quality 
of datasets (in median 7853 nuclei per sample) 
(fig. S22E). 


snATAC-seq using combinatorial indexing 


snATAC-seq was performed as described with 
steps optimized for automation and modifica- 
tions in permeabilization and sort buffers (24). 
A step-by-step-protocol for library preparation 
is available here: https://www.protocols.io/view/ 
sequencing-open-chromatin-of-single-cell-nuclei- 
sn-pjudknw/abstract. 

Brain nuclei were pelleted with a swinging 
bucket centrifuge (500g, 5 min, 4°C; 5920R, 
Eppendorf). Nuclei pellets were resuspended 
in 1 ml nuclei permeabilization buffer (OmM 
Tris-HCL (pH 7.5), 10mM NaCl, 3mM MgCl2, 
0.1% Tween-20 (Sigma), 0.1% IGEPAL-CA630 
(Sigma), 0.01% Digitonin (Promega) and 1% BSA 
(Proliant 7500804) in molecular biology-grade 
water) and pelleted again (500g, 5 min, 4°C; 
5920R, Eppendorf). Nuclei were resuspended in 
1 ml of high-salt tagmentation buffer (36.3 mM 
Tris-acetate, pH 7.8), ’72.6 mM potassium-acetate, 
11 mM Mg-acetate, 17.6% DMF) and counted 
using a hemocytometer. Concentration was 
adjusted to 2000 nuclei/9 ul, and 2000 nuclei 
were dispensed into each well of a 96-well 
plate. For tagmentation, 1 ul of barcoded Tn5 
transposomes were added using a BenchSmart 
96 (Mettler Toledo, RRID:SCR_018093; table 
S26), mixed five times, and incubated for 60 min 
at 37°C with shaking (500 rpm). To inhibit the 
Tnd reaction, 10 ul of 40 mM EDTA was added 
to each well with a BenchSmart 96 (Mettler 
Toledo, RRID:SCR_018093), mixed 10 times, 
and the plate was incubated at 37°C for 15 min 
with shaking (500 rpm). Next, 10 ul of 3x sort 
buffer (8% BSA, 3 mM EDTA in PBS) was added 
using a BenchSmart 96 (Mettler Toledo, RRID: 
SCR_018093). If processing two or more sam- 
ples per day, tagmentation was performed with 
different sets of barcodes in separate 96 well 
plates. After tagmentation nuclei from indi- 
vidual plates were pooled together, added to 
a FACS tube and stained with 3 uM Draq7 (Cell 
Signaling). Using a SH800 (Sony), 20 nuclei per 
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sample were sorted per well into eight 96-well 
plates (total of 768 wells, 15,360 nuclei per sam- 
ple) containing 10.5 ul EB (25 pmol primer i7, 
25 pmol primer i5, 200 ng BSA (Sigma). Pre- 
paration of sort plates and all downstream 
pipetting steps were performed on a Biomek 
i7 Automated Workstation (Beckman Coulter, 
RRID:SCR_018094). After addition of 1 ul 0.2% 
SDS, samples were incubated at 55°C for 7 min 
with shaking (500 rpm). Next, 1 ul 12.5% Triton-X 
was added to each well to quench the SDS. 
Next, 12.5 ul NEBNext High-Fidelity 2x PCR 
Master Mix (NEB) were added and samples 
were PCR-amplified (72°C 5 min, 98°C 30 s, 
(98°C 10 s, 63°C 30 s, 72°C 60 s) x 12 cycles, held 
at 12°C). After PCR, contents in all wells were 
combined. Libraries were purified according to 
the MinElute PCR Purification Kit manual (Qiagen) 
using a vacuum manifold (QIAvac 24 plus, 
Qiagen) and size selection was performed with 
SPRI Beads (Beckmann Coulter, 0.55x and 1.5x). 
Libraries were purified one more time with 
SPRI Beads (Beckmann Coulter, 1.5x). Libraries 
were quantified using a Qubit fluorimeter (Life 
technologies, RRID:SCR_018095) and the nu- 
cleosomal pattern was verified using a Tapesta- 
tion (High Sensitivity D1000, Agilent). Libraries 
generated were sequenced on a NextSeq500 
(RRID:SCR_014983, Illumina), HiSeq4000 (RRID: 
SCR_016386, Illumina) or NovaSeq6000 (RRID: 
SCR_016387, Illumina) using custom sequenc- 
ing primers with following read lengths: 50 + 
10+12+50 (Read1 + Index1 + Index2 + Read2). 
Indexing primers and sequencing primers are 
in table S29. 


Processing and alignment of sequencing reads 


Paired-end sequencing reads were demulti- 
plexed and the cell index transferred to the 
read name. Sequencing reads were aligned to 
hg38 reference genome using bwa (6/). After 
alignment, we used the R package ATACseqQC 
(1.10.2) (62) to check for fragment length con- 
tribution which is characteristic for ATAC-seq 
libraries. Next, we combined the sequencing 
reads to fragments, and for each fragment we 
performed the following quality control: (i) 
keep only fragments quality score MAPQ > 30; 
(ii) Keep only the properly paired fragments 
with length <1000 bp; and (iii) remove PCR du- 
plicates with SnapTools (https://github.com/ 
r3fang/SnapTools) (63). Reads were sorted based 
on the cell barcode in the read name. 


TSS enrichment calculation 


Enrichment of ATAC-seq accessibility at TSSs 
was used to quantify data quality without the 
need for a defined peak set. The method for 
calculating enrichment at TSS was adapted 
from previously described. TSS positions were 
obtained from the GENCODE database v29 
(hg38) (64). Briefly, Tn5 corrected insertions 
(reads aligned to the positive strand were shifted 
+4,bp and reads aligned to the negative strand 
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were shifted -5 bp) were aggregated +2000 bp 
relative (TSS strand-corrected) to each specific 
TSS genome-wide. Then this profile was normal- 
ized to the mean accessibility +1900 to 2000 bp 
from the TSS and smoothed every 11 bp. The 
max of the smoothed profile was taken as the 
TSS enrichment. 


Doublet removal 


We used a modified version of Scrublet (65) to 
remove potential doublets for every dataset 
independently. First, we count the reads on 
gene body and 2 kb upstream of TSS for every 
single nucleus that passed quality control. Next, 
cell-by-gene count matrices were used as input, 
with default parameters, the doublet scores were 
calculated for both observed nuclei {x; } and sim- 
ulated doublets {y; } using Scrublet (65). Then, 
a threshold @ is selected based on the distribu- 
tion of {y;}, and observed nuclei with doublet 
score larger than 8 are predicted as doublets. To 
determine 0, we fit a two-component mixture 
distribution by using function normalmixEM 
from R package mixtools (66). The lower com- 
ponent contained most of the embedded dou- 
blet types, and the other component contained 
most of the neo-typic doublets (collision between 
nuclei from different clusters. We selected 
the threshold 6 where the p, -pdf (@, u,, 01) = 
Do Daf (@, Uy, 62). This value suggested that 
the nuclei have same chance of belonging to 
both classes. 


Clustering and cluster annotation 


We used an iterative clustering strategy using 
the SnapATAC package (63) with modifications 
as detailed below. For round 1 clustering, we 
clustered and finally merged single nuclei to 
three main cell classes: non-neurons, GABAergic 
neurons, and glutamatergic neurons. For each 
main cell class, we performed another round of 
clustering to identify major cell types. Finally, 
for each major cell types, we performed a 
third round of clustering to find sub-types. A 
detailed description for every step is listed 
below. 


Step 1: Nuclei filtering 


Nuclei with =500/1000 uniquely mapped frag- 
ments and TSS enrichment >5/7 were filtered 
for individual dataset. 


Step 2: Doublet removal 


The potential barcode collisions were also re- 
moved for individual datasets. 


Step 3: Feature bin selection 


First, we calculated a cell-by-bin matrix at 500-kb 
resolution for every sample independently and 
subsequently merged the matrices. Second, 
we converted the cell-by-bin count matrix to a 
binary matrix. Third, we filtered out any bins 
overlapping with the ENCODE blacklist (hg38, 
https://github.com/Boyle-Lab/Blacklist) (67). 
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Fourth, we focused on bins on chromosomes 
1-22, X, and Y. We removed the top 5% bins 
with the highest read coverage from the count 
matrix. 


Step 4: Dimensionality reduction 


SnapATAC applies a nonlinear dimensionality 
reduction method called diffusion maps, which 
is highly robust to noise and perturbation. 
However, the computational time of the dif- 
fusion maps algorithm scales exponentially 
with the increase of number of cells. To over- 
come this limitation, we used adjacency spec- 
tral embedding (https://github.com/yixuan/ 
RSpectra) instead of diffusion maps, which 
can easily scale up to tens of thousands of nu- 
clei without memory issue. We combined the 
Nystrom method (a sampling technique) (68) 
to generate the low-dimensional embedding 
for large-scale dataset. 

A Nystrom landmark diffusion maps algo- 
rithm includes three major steps: (i) sampling: 
sample a subset of K (K«KN) cells from N total 
cells as “landmarks”; (ii) embedding: compute 
a diffusion map embedding for K landmarks; 
and (iii) extension: project the remaining N-K 
cells onto the low-dimensional embedding as 
learned from the landmarks to create a joint 
embedding space for all cells. 

Having more than 1.1 million single nuclei at 
the beginning, we decided to apply this strategy 
on level 1 and 2 clustering. A total of 35,000 cells 
were sampled as landmarks and the remain- 
ing query cells were projected onto the dif- 
fusion maps embedding of landmarks. Later 
for the level III clustering, diffusion map em- 
beddings were directly calculated from all 
nuclei. 


Step 5: Batch correction 


To correct the potential bias from donors and 
technical replicates, we applied batch correction on 
these variables using R package “Harmony” (68). 


Step 6: Eigenvector selection 


To determine the number of eigenvectors of 
diffusion operator to include for downstream 
analysis, we generated an “Elbow plot,” to rank 
all eigenvectors based on the percentage of var- 
iance explained by each one. For each round of 
clustering, we selected the top 10 to 20 eigen- 
vectors that captured most of the variance. 


Step 7: Generate K Nearest Neighbor Graph 


Using the selected significant eigenvectors, we 
next construct a K nearest neighbor (KNN) 
graph. Each cell is a node, and the KNNs of each 
cell were identified according to the Euclidian 
distance and edges were drawn between neigh- 
bors in the graph. To overcome the limitation 
on CPU memory and speed on the process, we 
applied ANNOY (Approximate Nearest Neigh- 
bors Oh Yeah) instead of the original KNN 
function (69). 


Step 8: Graph-based clustering 
We applied the Leiden algorithm on the KNN 
graph using Python package leidenalg (https:// 
github.com/vtraag/leidenalg) (70). 


Step 9: Optimization on cluster resolution 


We tested different “resolution_parameter” 
parameters (step between 0 and 1 by 0.1) to 
determine the optimal resolution for different 
cell populations. For each resolution value, we 
tested if there was clear separation between 
nuclei. To do so, we generated a cell-by-cell 
consensus matrix in which each element rep- 
resents the fraction of observations two nuclei 
are part of the same cluster. A perfectly stable 
matrix would consist entirely of zeros and ones, 
meaning that two nuclei either cluster together 
or not in every iteration. The relative stability 
of the consensus matrices can be used to infer 
the optimal resolution. To this end, we gener- 
ated a consensus matrix based on 100 rounds 
of Leiden clustering with randomized starting 
seed s. Let M* denote the N x N connectivity 
matrix resulting from applying Leiden algo- 
rithm to the dataset D*® with different seeds. 
The entries of M®* are defined as follows: 


M*(i,j) = f(x) 


= 1, if single nucleus i and j belong to the same cluster 
0, otherwise 


Let I° be the N x N identicator matrix where 
the (i, j)-th entry is equal to 1 if nucleus i and j 
are in the same perturbed dataset D*, and O 
otherwise. Then, the consensus matrix C is de- 
fined as the normalized sum of all connectivity 
matrices of all the perturbed D*. 


The entry (i,j) in the consensus matrix is the 
number of times single nucleus i and j were 
clustered together divided by the total number 
of times they were selected together. The matrix 
is symmetric, and each element is defined with- 
in the range [0, 1]. We examined the cumulative 
distribution function (CDF) curve and calculated 
the PAC score to quantify stability at each reso- 
lution. The resolution with a local minimum of 
the PAC scores denotes the parameters for the 
optimal clusters. In the case these were multi- 
ple local minimal PACs, we picked the one with 
higher resolution. Another measurement is the 
DC, which reflects the dispersion (ranges from 
O to 1) of the consensus matrix M from the 
value 0.5. The closer to 1 is the DC, the more 
perfect is consensus matrix, and thus the more 
stable is the clustering. In a perfect consensus 
matrix, all entries are O or 1, meaning that all 
connectivity matrices are identical. The DC is 
defined as: 


13 of 19 


Finally, for every cluster, we tested whether 
we could identify differential features com- 
pared with all other nuclei (background) and 
the nearest nuclei (local background) using 
the function ‘findDAR’. 


Step 10: Visualization 
For visualization, we applied UMAP (71). 


Dendrogram construction for mouse 
brain cell types 


First, we calculated for cCRE the median ac- 
cessibility per cluster and used this value as 
cluster centroid. Next, we calculated the CV 
for the cluster centroid of each element across 
major cell types. Finally, we only kept variable 
elements with CV larger than 10% quantile and 
smaller than 90% quantile for dendrogram 
construction. 

We used the set of variable features defined 
above to calculate a correlation-based distance 
matrix. Next, we performed linkage hierar- 
chical clustering using the R package pvclust 
(v.2.0) (72) with parameters method.dist=“cor” 
and method.hclust=“ward.D2”. The confidence 
for each branch of the tree was estimated by 
the bootstrap resampling approach. 


Regional specificity of cell types 


The specificity score is defined as Jensen-Shannon 
divergence, which measures the similarity be- 
tween two probability distributions. For each 
cell subclass, the contribution of different brain 
regions is first calculated. Then, we compared 
this distribution with the contribution of brain 
regions calculated from all sampled cells. We 
used function “JSD” from the R package philen- 
tropy for this analysis (73). 


Identification of reproducible peak sets 
in each cell cluster 


We performed peak calling according to the 
ENCODE ATAC-segq pipeline (https://www. 
encodeproject.org/atac-seq/). For every cell clus- 
ter, we combined all properly paired reads to 
generate a pseudobulk ATAC-seq dataset for 
individual biological replicates. In addition, 
we generated two pseudo-replicates which com- 
prise half of the reads from each biological 
replicate. We called peak for each of the four 
datasets and a pool of both replicates indepen- 
dently. Peak calling was performed on the Tn5- 
corrected single-base insertions using the MACS2 
(30) with these parameters: —-shift -75 --extsize 
150 --nomodel—call-summits --SPMR -q 0.01. 
Finally, we extended peak summits by 250 bp 
on either side to a final width of 501 bp for 
merging and downstream analysis. To gener- 
ate a list of reproducible peaks, we kept peaks 
that 1) were detected in the pooled dataset 
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and overlapped = 50% of peak length with a 
peak in both individual replicates or 2) were 
detected in the pooled dataset and overlapped 
> 50% of peak length with a peak in both 
pseudo-replicates. 

We found that when cell population varied 
in read depth or number of nuclei, the MACS2 
score varied proportionally because of the na- 
ture of the Poisson distribution test in MACS2 
(30). Ideally, we should perform a reads-in- 
peaks normalization, but in practice, this type 
of normalization is not possible because we 
don’t know how many peaks we will get. To 
account for differences in performance of 
MACS2 (30) based on read depth and/or num- 
ber of nuclei in individual clusters, we con- 
verted MACS2 peak scores (-log10(q-value)) to 
“score per million”””. We filtered reproducible 
peaks by choosing a “score per million” cut-off 
of 2 was used to filter reproducible peaks. 

We only kept reproducible peaks on chro- 
mosomes 1 to 19 and both sex chromosomes, 
and filtered ENCODE blacklist (hg38, https:// 
github.com/Boyle-Lab/Blacklist) (67). A union 
peak list for the whole dataset was obtained by 
merging peak sets from all cell clusters using 
BEDtools (74). 


Computing CA scores 


Accessibility of cCREs in individual clusters 
was quantified by counting the fragments in 
individual clusters normalized by read depth 
[counts per million (CPM)]. For each gene, we 
summed up the counts within the gene body + 
2 kb upstream to calculate the gene activity 
score (GAS), which was used for integrative 
analysis with other single cell modalities. For 
better visualization, we smoothed GAS to 50 
nearest neighbor nuclei using Markov Affinity- 
based Graph Imputation of Cells (MAGIC) (75). 


Integrative analysis of snATAC-seq 
and single-cell datasets 


For integrative analysis, we first filtered brain 
regions that matched samples profiled in this 
study (same brain dissections were used in 
companion papers). Second, we manually sub- 
set cell types into two groups, neurons and 
non-neurons, by checking both signal on var- 
iable marker genes and taxonomy labels from 
scRNA-seqg, snmC-seq, and snm3C-seq. To di- 
rectly compare our single nucleus CA derived 
cell clusters with the single cell transcriptomics 
defined taxonomy of the human brain, we first 
used the snATAC-seq data to impute RNA ex- 
pression levels according to the CA of gene 
promoter and gene body as described previ- 
ously. For single cell methylome, snmC-seq and 
snm3C-seq, the signal on gene body was nega- 
tively correlated with the gene expression. We 
imputed the RNA expression levels from meth- 
ylation signals by subtracting it from 1. 

Next, we used a three-step method analo- 
gous to Seurat v3 to project three modalities 
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onto the same space: (i) using canonical cor- 
relation analysis to capture the shared vari- 
ance across cells between datasets; (ii) finding 
anchors as 50 mutual nearest neighbors be- 
tween the two datasets; and (iii) pulling the 
three modalities into the same space. A de- 
tailed description can be found in a companion 
paper (76). To quantify the similarity between 
cell clusters from two modalities, we calculated 
an overlapping score as the sum of the minimum 
proportion of cells/nuclei in each cluster that 
overlapped within each coembedding cluster 
(8). Cluster overlaps varied from O to 1 and were 
visualized as a heatmap with snATAC-seq clus- 
ters in rows and scRNA-seq clusters in columns. 
We found strong correspondence between the 
two modalities, for example scRNA-seq and 
snATAC-seq, which was evidenced by coem- 
bedding of both transcriptomic (T-type) and 
CA (A-type) cells in the same joint clusters, as 
well as coembed with methylation (M-type) 
cells (fig. S6). For this analysis, we examined 
neurons and non-neuronal cell classes sep- 
arately, due to the CG methylation was more 
informative for non-neurons comparing to CH 
methylation. 


Identification of cis regulatory modules 


We used non-negative matrix factorization 
(NMEF) (77) to group cCREs into cis-regulatory 
modules based on their relative accessibility 
across major clusters. We adapted NMF [Python 
package: sklearn (78)] to decompose the cell- 
by-cCRE matrix V (NxM, N rows: cCRE, M 
columns: cell clusters) into a coefficient matrix 
H (RxM, R rows: number of modules) and a 
basis matrix W (NxR), with a given rank R: 


V2=WH 


The basis matrix defines module related ac- 
cessible cCREs, and the coefficient matrix de- 
fines the cell cluster components and their 
weights in each module. The key issue to de- 
compose the occupancy profile matrix was to 
find a reasonable value for the rank R (i.e., 
the number of modules). Several criteria have 
been proposed to decide whether a given rank 
R decomposes the occupancy profile matrix 
into meaningful clusters. Here we applied two 
measurements “Sparseness” (79) and “Entropy” 
(80) to evaluate the clustering result. Average 
values were calculated from 100 times for NMF 
runs at each given rank with random seed, 
which will ensure the measurements are stable. 

Next, we used the coefficient matrix to asso- 
ciate modules with distinct cell clusters. In the 
coefficient matrix, each row represents a mod- 
ule and each column represents a cell cluster. 
The values in the matrix indicate the weights 
of clusters in their corresponding module. The 
coefficient matrix was then scaled by column 
(cluster) from O to 1. Subsequently, we used a 
coefficient >0.1 (~95th percentile of the whole 
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matrix) as threshold to associate a cluster with 
a module. 

In addition, we associated each module with 
accessible elements using the basis matrix. For 
each element and each module, we derived a 
basis coefficient score, which represents the 
accessible signal contributed by all cluster in 
the defined module. In addition, we also im- 
plemented and calculated a basis-specificity 
score called “feature score” for each accessi- 
ble element using the “kim” method (80). The 
feature score ranges from 0 to 1. A high feature 
score means that a distinct element is specif- 
ically associated with a specific module. Only 
features that fulfill both following criteria 
were retained as module specific elements: 1. 
feature score greater than median + 2 stan- 
dard deviations; 2. the maximum contribu- 
tion to a basis component is greater than 
the median of all contributions (i.e., of all ele- 
ments of W). 


Identification of differentially accessible regions 
and definition of specificity score 


To identify cCRE that differentially accessible 
in either in cell cluster or brain region, we 
constructed a logistic regression model pre- 
dicting cluster/region membership based on 
each cCRE individually and compares this with 
a null model with a likelihood ratio test. We 
used two functions “fit_models” and “compare_ 
models” in R package Monocle3 (v0.2.2) (87) 
to perform the differential test. We designed 
the full model as 


logit (P;) =ajt+m+d;+e; 


and a reduced mode as 
logit(Py) =a; +dj+e; 


Where Py represents the probability of i™ site is 
accessible in the j” cell, a is the log, trans- 
formed total number of sites observed as acces- 
sible for the jth cell, 7 is membership of the j‘" 
cell in either cluster or region being tested, d is 
the donor label for j™ cell and ¢ is an error term. 

For each set of testing, between cell type or 
between regions in every cell subclass, we only 
kept cCREs that overlapped with peaks iden- 
tified in corresponding cell types. A likelihood 
ratio test is then used to determine whether 
the full model (including cell cluster or region 
membership) provided a significantly better fit 
of the data than reduced model. After correct- 
ing p-value using Benjamini-Hochberg meth- 
od, we set an FDR cutoff as 0.001 to filter out 
significant differential cCREs.Log, fold change 
is used for two-group comparison, for multi- 
ple groups, we calculated a Jensen Shannon 
divergence-based specificity score described 
in a previous study, to better assign differen- 
tial cCREs to cell type or brain region. The 
fraction of accessibility of each cluster f was 
first calculated for each i™ site. We normalized 
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these scores by multiplying by corresponding 
scaling factors, which are considering differ- 
ent overall complexity across groups. To do 
so, median number of sites accessible c in in- 
dividual cells for each cluster was calculated 
and followed with log,9-transforming. Then, 
we took the ratio of the average of c (across 
all clusters) over the median accessibility in 
each cluster as scaling factor. These corrected 
fraction of accessibility for each cCRE was then 
converted to probability by scaling by groups. 
Then, we calculated Jensen Shannon divergence 
(JSD) between two probability distributions. 
For example, the probability distribution for 
the 1° cCRE as d1, to test whether this cCRE is 
specific in groupl, we assumed another prob- 
ability distribution: 


= 1, group1l 
2 =| 0, otherwise 


Function “JSD” in R package “philentropy” 
was used to calculate JSD between these 
two probability distributions, and the Jensen- 
Shannon-based specificity score (JSS) was de- 
fined as: 


JSS = 1-,/ Jensen Shannon divergence 


For each group, we calculated JSS for every 
cCRE. To find a reasonable cut-off for deter- 
mining restricted or general cCREs, we consider 
JSS from all cCREs that are not identified as 
differential accessible (from likelihood ratio 
test) as a background distribution, and JSS 
from cCREs that passed our likelihood ratio 
test threshold and had positive values to be 
true positives. We set an empirical FDR cut-off 
at where type I error no more than 5%. 

Finally, the differential cCREs could be aligned 
to multiple subtype or brain regions based on 
JSS, we named the one can be assigned to only 
one type or region as region-specific or cell-type- 
specific cCREs. 


Predicting enhancer-promoter interactions 


First, coaccessible regions are identified for all 
open regions in each cell cluster (randomly 
selected 200 nuclei, and used all nuclei for cell 
cluster with <200 nuclei) separately, using 
Cicero (82) with following parameters: aggre- 
gation k = 10, window size = 500 kb, distance 
constraint = 250 kb. In order to find an opti- 
mal coaccessibility threshold for each cluster, 
we generated a random shuffled cCRE-by-cell 
matrix as background and identified coacces- 
sible regions from this shuffled matrix. We 
fitted the distribution of coaccessibility scores 
from random shuffled background into a nor- 
mal distribution model by using R package 
fitdistrplus (83). Next, we tested every coacces- 
sibility pairs and set the cut-off at coaccessibility 
score with an empirically defined significance 
threshold of FDR<0.01. 


CCRE outside of + 2 kb of transcriptional 
start sites (TSS) in GENCODE hg38 (v29) (64). 
were considered distal. Next, we assigned co- 
accessibility pairs to three groups: proximal- 
to-proximal, distal-to-distal, and distal-to-proximal. 
In this study, we focus only on distal-to-proximal 
pairs. We calculated PCC between gene expres- 
sion and cCRE accessibility across matched 
T-type and A-type clusters to examine the rela- 
tionship between coaccessibility pairs. To do 
so, we first aggregated all nuclei/cells from 
scRNA-seq and snATAC-seq for every joint 
cluster to calculate accessibility scores (log. 
CPM) and relative expression levels (logs nor- 
malized UMI). Then, PCC was calculated for 
every gene-cCRE pair within a 1 Mbp window 
centered on the TSS for every gene. We also 
generated a set of background pairs by ran- 
domly selecting regions from different chro- 
mosomes and shuffling of cluster labels. Finally, 
we fit a normal distribution model and defined 
a cut-off at PCC score with empirically defined 
significance threshold of FDR < 0.01, to select 
significant positively correlated cCRE-gene pairs. 


RNAscope in situ hybridization 
Tissue preparation 


All tissues were cryostat sectioned at 14: um onto 
SuperFrost Plus charged slides and allowed to 
adhere to the slide. The slides were then stored 
at -80 C until further use. 


Hybridization 


RNAscope in situ hybridization multiplex ver- 
sion 2 was performed as per the protocol estab- 
lished by Advanced Cell Diagnostics (ACD) with 
minor modifications. In brief, slides were fixed 
using cold (4°C) 4% PFA for 15 min. The samples 
were then dehydrated in 50% ethanol (5 min), 
70% ethanol (5 min) and 100% ethanol (5 min, 
twice) at room temperature. The slides were air 
dried for 10 min, after which boundaries were 
drawn around each section using a hydrophobic 
pen (ImmEdge PAP pen; Vector Labs). After 
the hydrophobic boundaries dried, sections 
were incubated with hydrogen peroxide (ACD, 
~5 drops per section) for 10 MIN at RT. Sec- 
tions were then washed twice with PBS for 
2 MIN at RT and, subsequently, incubated 
with 4-5 drops of RNAscope® Protease IV 
(15 min) at RT. Slides were washed twice in 1X 
phosphate buffered saline (PBS, pH 7.4) at 
room temperature (2 min each). Each slide 
was then placed in a prewarmed humidity con- 
trol tray (ACD) containing dampened filter 
paper, and a mixture of Channel probes (50:1:1 
dilution, as per ACD instructions) was pipetted 
onto each section until fully covered. The hu- 
midity control tray was placed in a HybEZ 
oven (ACD) for 2 hours at 40°C. After probe 
incubation, the slides were washed twice in 
1X RNAscope wash buffer and stored over- 
night in 5X SSC (Sigma-Aldrich) at RT. The 
following day, slides were returned to the oven 
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for 30 min after submersion in AMP-1 reagent 
(~5 drops). Two washes and amplification were 
repeated using AMP reagents with a 30-min, 
30-min, and 15-min incubation period, respec- 
tively. HRP-C1 (ACD, ~5 drops) was added to 
the slide and incubated in the HybEZ oven at 
40°C for 15 min. Two washes with LX wash 
buffer (2 min each) were performed and 150 ul 
pre-assigned Opal dye added (Opal Dyes; Akoya 
Biosciences, Opal Dye diluted 1:1500 in TSA 
buffer). The slides were incubated in the oven 
for 30 min before being washed twice in 1X 
wash buffer. Slides were then covered in HRP 
blocker (ACD, ~5 drops) and incubated for 
15 min before being washed twice. Slides were 
then washed two times in 0.1M phosphate 
buffer (PBS, pH=7.4) and incubated with DAPI 
(ACD, ~5 drops) for 30 s before being washed in 
PBS, air dried, and cover-slipped with Prolong 
Gold Antifade mounting medium. Slides were 
stored at 4°C until imaging. 


Image quantification 


Fluorescence signals were acquired using a a 
Zeiss AiryScan confocal laser scanning micro- 
scope equipped with a Plan-APOCHROMAT 
63x oil-immersion lens (N/A 1.4). Six fields of 
view (three per layer) of one tissue section were 
analyzed using Imaris (Bitplane). Before quan- 
tification, regions containing lipofuscin signals 
(defined by overlapping nonspecific signals 
appearing in both channels) were subtracted 
from each image. Then, positive-stained nuclear 
signals, defined by automated surface detec- 
tion function, were segmented, and classified 
into two groups depending on whether they 
expressed Gad2. Finally, Chrna2 expression 
within those two groups was quantified. 


Paired-Tag experimental procedure 


Paired-Tag experiment was performed on hu- 
man MIC tissue from donor D4 which snATAC- 
seq was also applied to. Paired-Tag experiment 
was performed as previously described (84) 
with minor modifications detailed below. First, 
antibodies against H3K27ac (Abcam, ab4729) 
or H3K27me3 (Abcam, ab195477) were pre- 
incubated with protein A-Tn5 fusion protein, 
and then incubated with permeabilized nu- 
clei overnight, to target the binding of protein 
A-fused Tn5 to chromatin. Tagmentation re- 
action and reverse transcription (RT) were 
then sequentially performed. Reactions were 
carried out in 12 different wells, each well con- 
taining roughly 2M nuclei and a well-specific 
DNA barcode included in the Tn5 transposase 
adaptors and RT primers, to label different 
samples or replicates. Next, a ligation-based 
combinatorial barcoding strategy was used to 
introduce the 2™ and 3™ rounds of DNA 
barcodes to the nuclei, by sequentially attach- 
ing well-specific DNA barcodes to the 5’-end 
of both chromatin DNA fragments and cDNA 
from RT in 4 x 96-well plates in each round. 
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Finally, the barcoded nuclei were divided into 
sub-libraries of 25,000 nuclei each and lysed 
with Proteinase K (NEB, #P8102). The chroma- 
tin DNA and cDNA were subsequently purified 
as a whole with SPRI beads (Beckman Coulter), 
pre-amplified, and splitted into two parts for 
RNA and DNA libraries. For the RNA part, 
libraries were constructed using the Nextera 
XT DNA Library Preparation Kit (Illumina, 
FC-131-1024). For the DNA part, an adapter 
containing Truseq 15 sequencing primer bind- 
ing site was ligated to the library fragments, 
followed by DNA cleanup using SPRI beads 
and indexing PCR. Final libraries were quanti- 
fied using qPCR, and sequenced on a NovaSeq 
6000 platform (Illumina). 


Paired-Tag data processing 


A detailed, step-by-step Paired-Tag data process- 
ing pipeline can be found at: https://github. 
com/cxzhu/Paired-Tag. 


Preprocessing 


Cellular barcodes from the sequencing reads are 
first extracted by matching the linker sequences 
adjacent to the cellular barcodes, which are then 
mapped to the cellular barcodes reference, with 
reads with more than 1 nt mismatch discarded. 
The adapter sequences are trimmed from 3’ of 
DNA and RNA libraries, with Poly-dT and ran- 
dom hexamer sequences further trimmed from 3’ 
of RNA libraries. The low-quality reads (L = 30, 
Q = 30) are excluded from further analysis. DNA 
reads are mapped to hg38. Reads with MAPQ<10 
are removed. PCR duplicates are also removed. 


Cell clustering 


RNA alignment files are converted to a matrix 
with cells as columns and genes as rows. Cells 
with less than 200 features in both DNA and 
RNA matrices are removed. The clustering of 
single-cells based on RNA-profiles was per- 
formed with Seurat package (69). Briefly, cell-to- 
gene counts are normalized, and the variable 
genes are selected for dimension reduction by 
PCA. Batch effects are corrected with Harmony 
(68), and single cell gene expression profiles 
are visualized with UMAP and clustered with 
Louvain algorithm (70). 


Cell-type—specific histone modification maps 


After clustering, the DNA component of Paired- 
Tag data corresponding to each histone mark 
in each cell cluster is aggregated to generate 
profiles of histone modification for the cluster. 
The aggregated aligned reads were then used 
to generate genome browser tracks to reveal 
the landscape of each histone modification in 
each cell type as defined by the RNA modality 
from the Paired-Tag data. 


Annotation 


To annotate each cCRE, we counted sequenc- 
ing reads on cCREs captured from Paired-Tag 
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experiments on two histone modifications, and 
fit a Poisson distribution to filter out cCREs that 
significantly (FDR < 0.05) marked by H3K27ac 
and H3K27me3. 


Integration analyses of CA between 
humans and mice 


To better compare the open chromatin land- 
scape within a similar cellular context, 18 major 
neuronal and glial cell types were extracted from 
the mouse cerebrum (28) for alignment with 
human snATAC-seq from this study. We first 
calculated CA at orthologous genes and cCREs, 
and enrichment of orthologoues motifs from 
pseudobulk profiles in each cell subclasses. Then, 
we preformed multidimensional scaling analy- 
ses and project cell classes on two diamentional 
spaces. 

To integrate at single-cell level, we randomly 
selected 1000 nuclei from each cell subclass in 
two species. Similarly, we first calculated frag- 
ment counts on orthologous genes and cCREs, 
and variability of orthologous motifs using 
chromVAR (835). For the integration analyses 
using orthologous genes and motifs as features, 
we generated Seurat objects in R. Next, variable 
features were identified and used for identify- 
ing anchors between cells from two species. 
Finally, to visualize all the cells together, we 
coembedded the cell profiles from two species 
in the same low dimensional space. For the in- 
tegration analysis using orthologous cCREs, we 
performed by using the SnapATAC package 
(63) as described above. We did not perform 
any correction for potential bias from species. 


Deep learning model Epiformer 


The deep learning model takes both one-hot- 
encoded DNA sequences (A = []1, 0, 0, 0], C = 
[0, 1, 0, O], G = [0, O, 1, 0], T = [0, 0, 0, 1]) and 
conservation scores (phastCons, range from O 
to 1) calculated from multiple species alignment 
as inputs. We first divide the human genome 
(hg38) into segments in length of 98,304 bp. 
Any genomic segments overlapped with human 
ENCODE “blacklist” regions, or contain ‘N’ are 
discarded. The remaining one-hot-encoded DNA 
sequences are then multiplied by the exponen- 
tial transformed conservation score: 


inputs = one hot encoding DNA sequence x eons 


The ATAC-seq signals on the corresponding 
genomic segments are treated as the predict- 
ing targets. We first generate and normalize 
the genomic signal tracks by using deepTools 
(86) with parameter “--normalizeUsing RPKM”. 
Then, each segment is cut into 768 genomic 
bins in size of 128-bp, the mean value of ATAC 
signal across 128 bp is taken and assigned to 
that bin. To avoid potential bias from outlier 
signals, we cutoff signal values higher than 
128, so that the bin signal is in the range of 0 
to 128. 
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The model architecture is inspired by Enformer 
(55) with some modifications (fig. S21). It is 
written based on one open-source machine 
learning framework, PyTorch, and contains 
three major parts. First, six residual convolu- 
tional blocks with max pooling (Conv Tower). 
The kernel size (4, 15) is used for the 1* resid- 
ual convolutional block, and kernel size (4, 5) 
is used for the rest of residual convolutional 
block with padding, to extract informative se- 
quence features at lower and higher resolu- 
tion, respectively; The batch normalization 
and Gaussian error linear unit (GELU) activa- 
tion function are inserted between each con- 
volutional layer. The convolutional blocks with 
pooling reduce the spatial dimension from 
98,304 bp to 768 so that each sequence posi- 
tion vector represents a bin of 128 bp. Second, 
transformer blocks including 4 multi-head 
attention layers is followed by the Conv Tower, 
to capture the long-range combinations and 
orders of sequence features. To inject posi- 
tional information, we add relative positional 
encodings (87). Dropout rate of 0.4 is used for 
attention layer to avoid potential overfitting. 
The SwiGLU (88), a GELU variant, is used as 
activation function. Third, a pointwise convo- 
lution block is included to aggregate the weights 
from the last transformer layer and eventually 
output the predicted signals. The GELU and 
SwiGLU activation functions are used with 
dropout rate of 0.1. 

The deep learning models were trained on 
20,000 human genomic sequences, each ~100 kb 
in length. For each step of training, we calculated 
model loss and PCC for every 8 sequences (one 
sequence per GPU). The loss was used to update 
model parameters through the back-propagation 
process. For each epoch, the model performance 
(loss) was evaluated on a validation set, which 
contained 5000 different human genomic se- 
quences each 100kb in length. We saved mod- 
els with the lowest validation loss to allow early 
stop and avoid overfitting. To avoid overfitting 
problem, we introduce data augmentation during 
training by randomly shifting the input sequence 
by up to 3 bp and reverse-complementing the 
input sequence while reversing the targets every 
other epoch. We used the Adam optimizer from 
PyTorch with a learning rate of 0.0001 and de- 
fault settings for other hyperparameters. The 
model was trained on 4 GPUs (NVIDIA GeForce 
RTX 3090 24GB). To overcome the limitation 
in GPU memory, we finally used a gradient ac- 
cumulation approach to save the gradients and 
update network weights every 8 batches, to 
achieve an effective batch size of 64 (16 per GPU). 
We implemented a cosine warmup learning rate 
scheduler by linearly increase the learning rate 
from 0 to target value in the first 10 epoches, 
and then decrease the learning rate to 0 by 
multiplying the cosine function in the next 
90 epoches. The negative log likelihood loss with 
Poisson distribution of target (PoissonNLLLoss) 
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is used for evaluating how well the model pre- 
dict the targets. A cropping layer is introduced 
to trim 64 positions on each side to avoid com- 
puting the loss on the far ends because these 
regions are disadvantaged because they can 
observe regulatory elements only on one side 
(toward the sequence center) and not the other 
(the region beyond the sequence boundaries). 

The rest of human genome sequences were 
treated as testing set (~5000 sequences), and 
used to evaluate the model performance. To 
achieve a fair comparison, we calculated PCC 
and loss on eight sequences randomly selected 
from testing set. We finally picked with model 
with lowest loss for downstream analyses. To 
evaluate the influence of risk variants on pre- 
dicted chromatin accessibitly, we performed 
in silico mutagenesis by altering every nucle- 
otide within one cCRE. The predicted signals 
were calculated for both reference and alterative 
sequence for every output genomic position/bin. 
We then took the sum of predicted signals 
on output positions/bins that overlapped with 
cCRE, and measure the difference by substract- 
ing the sum value on reference sequence from 
the sum value on alternative sequences. 


Motif enrichment 


We performed both de novo and known motif 
enrichment analysis using Homer (v4.11) (89). 
For cCREs in the consensus list, we scanned a 
region of + 250 bp around the center of the 
element. Randomly selected background re- 
gions are used for motif discovery. 


GO enrichment 


We perform gene ontology enrichment anal- 
ysis using R package Enrichr (90). Gene set 
library “GO_Biological_Process_2018” was 
used with default parameters. The combined 
score is defined as the P value computed 
using the Fisher’s exact test multiplied with 
the z-score of the deviation from the expected 
rank. 


GWAS enrichment 


To enable comparison with GWASs of human 
phenotypes, we used liftOver with default set- 
ting “-minMatch=0.95” to convert accessible ele- 
ments from hg38 to hg19 genomic coordinates 
(91). Next, we reciprocal lifted the elements back 
to hg38 and only kept the regions that mapped 
to original loci. We further removed converted 
regions with length > 1kb. 

We obtained GWAS summary statistics for 
quantitative traits related to neuropsychiatric 
disorders and control traits (table S25). We 
prepared summary statistics to the standard 
format for linkage disequilibrium (LD) score 
regression. We used homologous sequences for 
each major cell types as a binary annotation, 
and the superset of all candidate regulatory 
peaks as the background control. For each trait, 
we used cell type-specific LD score regression 


(https://github.com/bulik/Idsc) to estimate the 
enrichment coefficient of each annotation jointly 
with the background control (92). 


Fine mapping 


We obtained 99% credible sets for schizophre- 
nia from the Psychiatric Genomics Consor- 
tium website (https://www.med.unc.edu/pgc/). 
Potential causal variants with a posterior prob- 
abilities of association (PPA) score larger that 1% 
are used for overlapping with cCREs. 


External datasets 


We listed published datasets we used in this 
study for intersection analysis: rDHS regions 
for hg38 is obtained from SCREEN database 
(https://screen.encodeproject.org). The cCREs 
identified from adult and fetal human brains 
were downloaded from previously published 
snATAC-seq dataset. The PhastCons conserved 
elements were download from the UCSC Ge- 
nome Browser (http://hgdownload.cse.ucsc. 
edu/goldenpath/hg38/phastCons100way/). The 
JASPAR motif prediction was downloaded from 
the UCSC Genome Browser (http://expdata. 
cmmt.ube.ca/JASPAR/downloads/UCSC_tracks/ 
2022/). The gene expression level of Vip*/VIP* 
cell type from human and mouse brain were 
downloaded from Allen Brain Map: Cell Types 
Database (https://portal.brain-map.org/atlases- 
and-data/rnaseq). The gene expression level 
of gene BINI, MEF2A, and SP// in normal hu- 
mans and patients with Alzheimer’s disease 
were download from Allen Brain Map: Seattle 
Alzheimer’s Disease Brain Cell Atlas (SEA-AD) 
(https://portal.brain-map.org/explore/seattle- 
alzheimers-disease). 
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Spatiotemporal molecular dynamics of the 
developing human thalamus 


Chang N. Kimt, David Shint, Albert Wang, Tomasz J. Nowakowski* 


INTRODUCTION: The thalamus represents a crit- 
ical node of communication between the brain 
and the outside world. The thalamus is orga- 
nized into regions called nuclei and emerges 
from progenitors in the caudal forebrain during 
embryonic development. Although the num- 
ber of thalamic nuclei and projection patterns 
is largely conserved in mammals, the molecu- 
lar trajectories of cell fate specification and their 
organization into nuclei are largely unknown, 
especially in humans. We sought to better 
understand the molecular diversity and spa- 
tial organization of cell types in the developing 
human thalamus using single-cell and spatial 
transcriptomics. 


RATIONALE: Much of our understanding of the 
molecular diversity and spatial organization of 
cell types in the mammalian thalamus comes 
from rodent models, and yet the degree to 
which this is conserved in humans remains 
unclear, especially in the context of develop- 
ment. By integrating previously published data- 
sets, we generated a reference atlas of the 
first-trimester thalamus, spanning Carnegie 
stages 6 to 10 and represented by five individ- 
uals and 27,362 cells, and a reference atlas of 


Single-cell transcriptomics 


Spatial 
localization 


of cell types 


the second-trimester thalamus, spanning ges- 
tational weeks 16 to 25 and represented by 
10 individuals and 137,007 cells. Spatial valida- 
tion of cell types was performed using the 
MERFISH spatial transcriptomics platform. 


RESULTS: Our study provides the first com- 
prehensive characterization of cell types and 
their spatial organization in the developing 
human thalamus. Neurogenesis in the thal- 
amus and the neighboring prethalamus occurs 
during the first trimester, giving rise to gluta- 
matergic and y-aminobutyric acid-mediated 
(GABAergic) neurons. Glutamatergic neurons 
differentiate into two major subtypes that orga- 
nize into spatially and molecularly distinct 
nuclei during the second trimester. The molec- 
ular identities and spatial distribution of glu- 
tamatergic neurons in the human thalamus 
appear to be similar to those that have been 
reported in mice. By contrast, we observe a sig- 
nificant expansion in the number and spatial 
distribution of GABAergic neurons in the hu- 
man thalamus, the majority of which originate 
from brain regions outside the thalamus and 
migrate in during the second trimester. Tran- 
scriptomic profiling reveals an unexpected degree 


(. 


of GABAergic neuron diversity and confi Chec 
updé 
the presence of what may be human-spel.—~ 
GABAergic neuron subtypes that originate 
from a neighboring region of the developing 
forebrain known as the ganglionic eminences. 
The onset of gliogenesis occurs during the 
second trimester, giving rise to glial progenitor 
cells, astrocytes, and ependymal cells. Our study 
revealed spatially resolved patterns of glial cell 
types. We observe a maturity gradient along 
the medial-lateral axis during the second tri- 
mester, with glial progenitor cells showing en- 
richment in more-medial regions of the thalamus 
and astrocytes showing enrichment in more- 
lateral regions of the thalamus. We identified 
two major subtypes of astrocytes, one showing 
enrichment within the thalamus and another 
showing enrichment in neighboring brain re- 
gions adjacent to the thalamus. Ependymal 
cells line the outer edges of the thalamus. 
Other nonneuronal cell types, including cells of * 
oligodendrocyte lineage, microglia, and cells of 
the neurovasculature, were found throughout 
the thalamus. 


CONCLUSION: Our study defines the molecu- 
lar identities and spatial organization of cell , 
types in the developing human thalamus and 
highlights an understudied GABAergic neu- 
ronal population that may contribute to hu- 
man evolution. 


The list of author affiliations is available in the full article online. 
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Differential distribution of neuronal subtypes in thalamic nuclei 
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Spatially resolved cell types in the developing human thalamus. Integrative analysis of single-cell and spatial transcriptomics was used to profile the molecular 
diversity and spatial organization of cell types in the developing human thalamus. The schematic on the right depicts the biased distribution of neuronal subtypes in thalamic 
nuclei. EN1 and EN2, excitatory neurons 1 and 2; IN, inhibitory neuron. 
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The thalamus plays a central coordinating role in the brain. Thalamic neurons are organized into 
Spatially distinct nuclei, but the molecular architecture of thalamic development is poorly understood, 
especially in humans. To begin to delineate the molecular trajectories of cell fate specification and 
organization in the developing human thalamus, we used single-cell and multiplexed spatial 
transcriptomics. We show that molecularly defined thalamic neurons differentiate in the second 
trimester of human development and that these neurons organize into spatially and molecularly distinct 
nuclei. We identified major subtypes of glutamatergic neuron subtypes that are differentially enriched 
in anatomically distinct nuclei and six subtypes of y-aminobutyric acid—mediated (GABAergic) 

neurons that are shared and distinct across thalamic nuclei. 


he thalamus can be anatomically sub- 

divided into multiple nuclei with distinct 

patterns of neuronal projections (7). Many 

thalamic subregions are hypothesized 

to have undergone extensive remodel- 
ing and reorganization in recent evolution, 
including the pulvinar nucleus that is exten- 
sively expanded in humans and primates com- 
pared with mice (2, 3). Thalamic neurons 
emerge from progenitor cells located in the 
diencephalon of the prenatally developing 
nervous system (4). Recent studies have begun 
to apply single-cell genomics approaches to 
uncover the molecular dynamics of their dif- 
ferentiation in mice (5), but our understand- 
ing of thalamic neuron differentiation in 
humans is less well characterized. To better 
understand their molecular and spatial orga- 
nization, we investigated the developing hu- 
man thalamus during the first and second 
trimesters of development [gestational weeks 
(GW) 6 to 25] using single-cell and spatial 
transcriptomics. 


Neurogenesis in the first-trimester thalamus 


To identify the molecular trajectories of cell 
types in the developing human thalamus dur- 
ing neurogenesis, we analyzed single-cell RNA 
sequencing (SCcRNA-seq) datasets generated 
from samples from the first trimester of human 
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development (6, 7). Across five samples (three 
males and two females) between GW6 and 
GW10, data reprocessing retained 27,362 cell 
transcriptomes passing quality control metrics 
generated using droplet-based scRNA-seq 
(10X Chromium v2 assay) (fig. SIA). Initial 
preprocessing steps include ambient RNA re- 
moval using FastCAR (8), doublet filtering 
with scds (9), and further cell filtering using 
the spliced/unspliced ratio with DropletQC 
(10). We then performed SCTransform nor- 
malization (7) with Harmony (72) for batch 
correction between different samples and 
Louvain clustering to identify cell types (Fig. 1, 
A and B). We recovered 10 clusters represent- 
ing all of the major classes in the thalamus, 
including radial glia (RG1 and RG2), inter- 
mediate progenitor cells (IPC1 and IPC2), 
y-aminobutyric acid-mediated (GABAergic) 
inhibitory neurons (IN1 to IN3), glutamater- 
gic excitatory neurons (nEN and EN), and 
neural crest-derived mesenchymal stem cells 
(MES) that develop to form the cerebral vas- 
culature (Fig. 1B) (73). 

Radial glia are neural stem cells in the de- 
veloping brain. In the embryonic diencephalon, 
radial glia line the third ventricle and are par- 
titioned into three zones: prosomere 1, which 
gives rise to the pretectum; prosomere 2, which 
gives rise to the thalamus and epithalamus; 
and prosomere 3, which gives rise to the pre- 
thalamus (/4). They express PAX6 but lack the 
expression of intermediate progenitor cell (IPC) 
markers NEURODI1, NEUROGI, NEUROG2, and 
ASCLI or neuronal markers such as SLCI7A6 
and GADI (15-17). We identified two clusters 
of radial glia (RG1 and RG2) (Fig. 1B). RG1 are 
prosomere 2-derived thalamic radial glia based 
on their expression of OLJG3, a marker for 
radial glia and IPCs of thalamic glutamatergic 
lineage (Fig. 1C) (78). RG1 was enriched for 
genes associated with the Wnt pathway, in- 
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cluding the Wnt ligands WN72B and WNT3, 
the Wnt co-receptor LRP6, and a negative 
feedback regulator ZNRF3, consistent with the 
major role of Wnt signaling in thalamus de- 
velopment (Fig. 1C and fig. S1D) (29, 20). Other 
enriched genes included GLI2, a mediator of 
sonic hedgehog signaling, and the Rho gua- 
nosine triphosphatase (GTPase) CPNES (fig. 
S1D). Despite the lack of canonical IPC marker 
expression, these cells also express LHX9, a 
marker associated with thalamic IPCs and 
glutamatergic neurons in mice and zebrafish 
(Fig. 1C) (17, 27). RG2 are likely radial glia of 
GABAergic lineage with representation from 
the prosomere 3-derived prethalamus based 
on the expression of FOXD] and LHX5 (22) and 
prosomere 2-derived rostral thalamus based 
on the expression of NKX2.2 and OLIG2 (Fig. 
1C and fig. S1, C and D) (23-28). Other dis- 
tinguishing marker genes include JD3, SFRP2, 
GPC3, and the antisense RNA gene LHX5-ASI 
(Fig. 1C). [D3 was previously described as a 
pan-radial glial marker in the mouse dienceph- 
alon including the thalamus (29), but [D3 ap- 
pears to be specific to RG2 and MES in humans 
(Fig. 1C). A sparse population of PAX3- and 
LMOI-expressing prosomere 1 or pretectal pro- 
genitors (5, 24, 30) and SHH- and SIM2- 
expressing zona limitans intrathalamica (ZLI) 
progenitors that reside between the thalamus 
and prethalamus (5, 31, 32) also cluster with 
RG2, which suggests that there was not enough 
representation from these populations to clus- 
ter separately (fig. S1, C and D). We also iden- 
tified two IPC populations. IPC1 represents 
thalamic glutamatergic IPCs that express canon- 
ical markers that include OLIG3, NEUROD1, 
NEUROGI, NEUROG2, and NEUROD4 (Fig. 1C 
and fig. SIC) (17, 33). We found additional 
markers for IPC1, including KIFI8B, C2IORF58, 
APOLDI, EYA2, and ARHGAP11B—a human- 
specific gene that has been implicated in the 
evolutionary expansion of the human neocor- 
tex (Fig. 1C) (34-38). Preferential expression 
of ARHGAPIIB in thalamic IPCs suggests that 
it may be involved in the elaboration of the 
human thalamus, similar to its known role in 
the cortex. IPCs of GABAergic lineage (IPC2) 
from prosomeres 2 and 3 express the GABAergic 
proneural transcription factor ASCL1 (Fig. 1C) 
(17). This cluster was additionally enriched for 
the Wnt ligand WN77B, the N-methyl-p-aspartate 
(NMDA) receptor subunit GRIN2A, the tran- 
scription factor GLIS3, the epidermal growth 
factor receptor (EGFR) regulator LRIGI, and 
the tumor suppressor gene FAMIO7A (Fig. 1C). 

Glutamatergic and GABAergic neurons were 
identified on the basis of the expression of 
SLCI7A6 and GADI, respectively. We identified 
two glutamatergic neuron populations in the 
scRNA-seq dataset, excitatory neurons (ENs) 
and newborn excitatory neurons (nENs), which 
pertain to thalamic neurons based on their 
expression of canonical markers LHX9, LHX2, 
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Fig. 1. Neurogenesis in the first-trimester thalamus. (A) Schematic overview 
of the figure. We analyzed scRNA-seq data collected from three biologically 
independent first-trimester thalamus samples (GW6 to GW10). Spatial 
transcriptomics using MERFISH was performed to spatially map cell types 
identified by scRNA-segq. t-SNE, t-distributed stochastic neighbor embedding. 
(B) UMAP embedding of cell type annotations on the Harmony batch-corrected 
space. EN, excitatory neuron; nEN, newborn excitatory neuron; IN1 to IN3, 
inhibitory neurons 1 to 3; RG1 and RG@2, radial glia 1 and 2; IPC1 and IPC2, 
intermediate progenitor cells 1 and 2; MES, mesenchyme. (C) Dot plot of marker 
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genes across cell types and states. (D) H&E staining of a MERFISH adjacent 
coronal section of CS22 (GW10). D, dorsal: V, ventral; M, medial; L, lateral. 
(E) MERFISH region annotations. MZ, thalamic mantle zone; GZ, thalamic 
germinal zone; GE, ganglionic eminences. (F) Spatially projected cell 

type annotations. (G) Relative proportion of cell types within regions from 
(E). (H) Spatially projected expression of select marker genes, including 
telencephalic (FOXG1) and diencephalic (TCF7L2 and LHX9) identity, dividing 
cells (MKI67), radial glia and IPCs (OL/G3), and rostral thalamus-—derived 
GABAergic neurons (SOX14). 
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and GBX2 (Fig. 1C and fig. SID) (39, 40). ENs 
appear to be mature neurons based on their 
enrichment for GAP43 (fig. SID). These cells 
are additionally enriched for markers includ- 
ing SGCZ, PLPPR1, and SNCA (Fig. 1C). nENs 
are newborn neurons based on coexpression 
of GBX2, a marker expressed in postmitotic glu- 
tamatergic neurons in the thalamus (39-41) 
and IPC markers NEURODI and NEUROG2 
(Fig. 1C and fig. SID). These cells are enriched 
for CACNA2DI1, DPYD, KALRN, and EYA2 (Fig. 
1C and fig. S1D). Three clusters represented 
GABAergic inhibitory neurons (IN1, IN2, and 
IN3). IN1 and IN2 represent prosomere 3- 
derived prethalamic neurons given their ex- 
pression of canonical markers PAX6, DLX5, 
SIX3, and ARX (16, 39, 42-44) and absence of 
FOXGI expression, which is a pan-telencephalic 
marker (45) (Fig. 1C and fig. S1D). INI is ad- 
ditionally enriched for ZNF804B, KCNMB2, 
ISLI1, and SST—a gene encoding for the neuro- 
transmitter somatostatin (Fig. 1C and fig. SID). 
IN2 is a LHXI-positive cluster enriched for 
the antisense RNA gene DLX6-ASI and RELN 
(Fig. 1C). IN3 are likely prosomere 2- or rostral 
thalamus-derived GABAergic neurons, based 
on the expression of LHX1, NXK2.2, OTX2, 
SOX14, and genes encoding for the neurotrans- 
mitters neuropeptide Y (VPY) and enkephalin 
(PENK) (fig. SID) (18, 46-49). Whereas OTX2 
and SOX/4 are also expressed in GABAergic 
neurons derived from midbrain and from pro- 
somere 1-derived pretectum, NKX2.2 expres- 
sion is absent in pretectal and midbrain-derived 
GABAergic neurons (/8, 32, 50). Furthermore, 
these cells lack expression of LMOIJ and [RX3, 
markers of the caudal diencephalon also ex- 
pressed in midbrain-derived GABAergic neu- 
rons, which suggests that the migratory stream 
of interneurons from the midbrain has not 
emerged during the first trimester (fig. SIC 
and fig. S3D) (5, 51, 52). 

We performed MERFISH spatial transcrip- 
tomics on a coronal section from the GW10 
rostral human forebrain to spatially identify 
cells from the glutamatergic thalamic lineage 
identified from scRNA-seq (Fig. 1D). We iden- 
tified three distinct regions based on a hema- 
toxylin and eosin (H&E) staining of an adjacent 
slice: the medial and lateral ganglionic emi- 
nences (GEs), the thalamic germinal zone (GZ), 
and the thalamic mantle zone (MZ) (Fig. 1D). 
Our MERFISH analysis confirmed regional 
identity on the basis of mutually exclusive 
expression of TCF7L2 and FOXGI in the thal- 
amus and GE, respectively (Fig. 1H). The GZ 
is enriched for OLJG3, and the MZ is enriched 
for SLCI7A6 (Fig. 1H and fig. SIF). IWKI67 ex- 
pression segregated the ventricular zone (VZ) 
from the subventricular zone (SVZ) (Fig. 1H). 
We performed clustering analysis of our 
MERFISH dataset to relate the scRNA-seq 
populations to the regional annotations (Fig. 
1, E and F). We identified four clusters that 
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corresponded to IPCI, nEN, EN, and IN3 from 
our single-cell sequencing dataset (Fig. 1F). 
IPC1, which coexpress OLIJG3 and EYA2, re- 
side in the GZ (Fig. 1, G and H, and fig. S1F). 
Newborn neurons (EN2) expressing GBX2 and 
EYA2 are localized to the GZ and MZ, with 
bias for the GZ (Fig. 1, F to H, and fig. S1F). 
Mature LHX9- and SLCI7A6-expressing gluta- 
matergic neurons (EN) are predominantly 
found in the MZ (Fig. 1, F to H, and fig. SIF). 
We also observed the presence of SOX]4- and 
NKX2.2-expressing GABAergic neurons (IN3) 
associated with rostral thalamus origin in the 
MZ (Fig. 1, F to H, and fig. SIF). PAX6- and 
OLIG3-expressing radial glia are observed at 
the VZ (Fig. 1H and fig. SIF). However, be- 
cause of damage at the VZ, there were likely 
too few of these cells in our MERFISH data to 
form a discrete cluster. 

Collectively, our analysis uncovers the cellu- 
lar diversity of the human thalamus and their 
spatial distribution during the first trimester. 
We identify OLIG3-positive thalamic radial glia 
and IPCs and the glutamatergic neurons that 
they likely give rise to that classify broadly into 
two molecular subtypes. In addition, we define 
ID3- and ASCLI-expressing neural progenitors 
and their presumed GABAergic neurons prog- 
eny, which can be classified into three subtypes 
of diencephalon-derived GABAergic neurons. 
These cell types express similar patterns of 
marker genes compared to what has been 
previously observed in other species in addi- 
tion to genes not previously associated with 
thalamic cell type-enriched expression. These 
findings suggest that the overall molecular 
blueprint of first-trimester diencephalic cell 
types is conserved in humans. The majority 
of GABAergic neurons in the thalamus has 
been reported to originate from extrathalamic 
regions, including the midbrain in mice (50, 53) 
and possibly the GEs in humans (54, 55). How- 
ever, we do not observe evidence for their 
presence in the first trimester, which suggests 
that these migratory streams develop later. 
Spatially, the thalamus has not yet partitioned 
into nuclei based on cytoarchitecture revealed 
by Nissl staining or by molecular profiles re- 
vealed by MERFISH (Fig. 1, D to F) (56). As 
expected for this stage of development, thala- 
mic cells segregate into layers that relate to the 
germinal zones, containing radial glia and IPCs, 
and the mantle zone, containing migrating 
and maturing neurons. 


Differentiation of thalamic neurons in the 
second trimester 


Next, to gain insight into the molecular mech- 
anisms and cellular composition of early 
thalamic nuclei, we analyzed scRNA-seq and 
single-nucleus RNA-seq (snRNA-seq) datasets 
from the second trimester of development, 
which were derived from 10 specimens (four 
female and six male) between GW16 and GW25. 
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This analysis retained 137,761 high-quality 
cells or nuclei. For a subset of the samples, 
metadata information for thalamic subdivi- 
sions had been retained from dissections (6) 
based on microdissections along the dorsal- 
ventral and rostral-caudal axes and included 
the pulvinar nucleus (fig. S4A). We identi- 
fied 23 cell clusters, including all major cell 
classes: glutamatergic neurons (ENI1 and EN2), 
GABAergic neurons (INI to IN6), IPCs, astro- 
cytes (ACI and AC2), glial progenitor cells, 
dividing cells, microglia, ependymal cells, oligo- 
dendrocyte progenitor cells, and oligoden- 
drocytes (Fig. 2A). We also identified several 
vascular cell classes including endothelial cells, 
pericytes, and fibroblasts that were annotated 
based on our recent transcriptomic atlas of 
the human cerebrovasculature (Fig. 2A) (23). 

Neural and glial progenitors were identified 
on the basis of their expression of HESI (fig. 
S3D). However, unlike in the first trimester, we 
did not detect OLJG3+ clusters associated with 
thalamic progenitor cells (fig. S3D), in line with 
the early window of neurogenesis observed 
in the mouse (57-59). However, we identified 
MKI67+ dividing cells and NEURODI/LHX1+ 
IPCs that were enriched for MEIS] and PAX3, 
markers of pretectal progenitors (Fig. 2B and 
fig. S3D) (5). This observation suggests ongoing 
neurogenesis in the pretectum, consistent 
with prior findings in the diencephalon (58). 
We identified two molecularly distinct astro- 
cyte populations, AC1 and AC2, based on their 
expression of known astrocyte markers AQP4 
and GJAI (60-63) (fig. S83D). These popula- 
tions are differentiated by LGR6 (AC1), a marker 
recently found to be enriched in thalamic 
astrocytes in the adult brain (64), and SPARCLI 
(AC2) (Fig. 2B) (65). Ependymal cells exhibited 
similar transcriptomic profiles to astrocytes 
but were distinguished based on expression 
of FOXJ1 (fig. S3D) (66). We additionally ob- 
served a glial population that expresses EGFR, 
ASCL1, OLIGI, and OLIG2 but not PDGFRA, 
which is a signature of multipotent glial pro- 
genitor cells that may produce both astrocytes 
and oligodendrocytes (Fig. 2B) (67, 68). By 
contrast, oligodendrocyte progenitor cells and 
oligodendrocytes were distinguished from glial 
progenitor cells based on the expression of 
canonical markers PDGFRA and MBP, respec- 
tively (Fig. 2B) (69, 70). These results suggest 
that the end of neurogenesis and the onset of 
gliogenesis in the human thalamus occurs by 
GW16, which represents the earliest sample in 
our second-trimester dataset. 

Glutamatergic EN clusters were defined on 
the basis of shared expression of LHX9 and 
SLCI7A6 (Fig. 2B and fig. S3D) and consisted 
of two clusters (EN1 and EN2). EN1 was en- 
riched for neurotensin (encoded by N73), calcyclin 
(encoded by SIO0A6), and SOX2, whereas EN2 
was enriched for RNF220, CRTACI, and FOXP2 
(Fig. 2B and Fig. 3B). EN1 and EN2 likely relate 
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Fig. 2. Cellular diversity in the second-trimester thalamus. (A) Constellation 
plot representing cellular diversity in the second-trimester human thalamus. 
PC, pericytes; OPC, oligodendrocyte progenitor cells; OL, oligodendrocytes; MG, 
microglia; IN1 to IN6, inhibitory GABAergic neurons 1 to 6; GL, glial progenitor 


to the SOX2- and FOXP2-expressing subtypes 
recently observed in the embryonic mouse thal- 
amus (5). Three classification schemes exist for 
thalamic glutamatergic neurons in the adult 
brain for which molecular markers have been 


Kim et al., Science 382, eadf9941 (2023) 


©0@0000808@0000 


©0 ©0@@o o@e o 
°0 0 O@@Oe 


oO 


» © ©@0@0®@ ° 
»e OMooc 
© @O°0 oe 


@o 


©) 
@eo000 
@000@ 


@ 0 ec 


»O@O0000 


@@@Qc00e@ 
2e2@oce@o 


@o@oo 


© oe 

o®@ 

° @o 
00 @ O 
@ooe@ec 


@ @ 


2) 
©00000 0 0@e 


©®e@e @ 8 ee 8 8 6 


©OCEe@QVCPL00 


2O@Oo o 0o@MO0 O@Oo 


@©COOSBOOOOO 


Oo 9° 


O@@ oe 


.Oo2OQo00 


e©e38e @ 


©0809 O0070000O0 OO O®QcCco 


©0000 0 @0@G00l OOO 


@@@ooeo 
@@oee2d@o@ 
© © 0 @o © © o 
oo 0 @ 
220@oo 


@@e@ocoo 


i@@@0000 


ST18 | 
FSTL4_ 


GLIS3 
ERVMER61-1 


LINCO01088 
CHRM2 
ZMAT4- 
DLX6-AS1. 
PDZRN3 | ., 
ANGPT2 


OTX2-AS1 


identified. The core-matrix model classifies tha- 
lamic neurons on the basis of whether they 
project to layer IV or layer I in the cortex and 
in primates can be molecularly identified based 
on their expression of CALBI or PVALB (71). 
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cells; FB, fibroblasts; EP, ependymal cells; EN1 and EN2, excitatory glutamatergic 
neurons 1 and 2; EC, endothelial cells; DIV, dividing cells; AC1 and AC2, 
astrocytes 1 and 2. (B) Dot plot represents enrichment of select marker gene 
expression across cell types in the second-trimester human thalamus. 


The first-order or higher-order model relates 
to thalamic nuclei and their involvement in 
sensory relay pathways or transthalamic cortico- 
cortical communication (72), and their transcrip- 
tional signatures have been ascertained in mice 
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Fig. 3. Spatially mapped thalamic glutamatergic neuron subtypes in the human 
midgestation thalamus. (A) Eight sagittal sections were collected from three biologically 
independent second-trimester thalamus specimens (GW16, GW21, GW23). Six 
representative sections are shown here, and the other two are shown in figs. S6 and 
S7. (B) Marker genes enriched for glutamatergic subtypes EN1 and EN2. (C) UCell 
enrichment scores for gene signatures relating to known classes of thalamus neurons 
from adult mouse studies calculated for glutamatergic neuron subtypes EN1 and EN2. 
PFC, prefrontal cortex; V1, visual cortex; Al, auditory cortex; M1/S1, motor/somatosensory 
cortex. (D) Nissl stains of sagittal sections. Sections are in order from left (lateral-most) 
to right (medial-most). Scale bars, 2 mm. RT, reticular nucleus; PUL, pulvinar; LGN, lateral 
geniculate nucleus; MGE, medial ganglionic eminence; Cd, caudate; GP, globus pallidus; 
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STN, subthalamic nucleus; VPL, ventral posterior lateral nucleus; MGN, medial geniculate 
nucleus; MB, midbrain; VA, ventral anterior nucleus; VL, ventral lateral nucleus; CM, 
centromedian nucleus; Zl, zona incerta; AV, anteroventral nucleus; AM, anteromedial 
nucleus; LD, dorsolateral nucleus; MD, dorsomedial nucleus; PC, paracentral nucleus; PV, 
paraventricular nucleus; SM, stria medullaris; ET, epithalamus. (E) Spatial clustering 
analysis of MERFISH transcriptomics data. EN1 and EN2 from spatial clustering analysis 
relate to EN1 and EN2 from scRNA-seq data based on gene expression profiles. EN-STN 
and EN-MB are distinct clusters not represented by our scRNA-seq and relate to 
glutamatergic neurons in the subthalamic nucleus and midbrain, respectively. (F) Spatial 
feature plots showing the expression of two example marker genes for EN1 (SOX2 and 
NTS) and EN2 (FOXP2 and CRTACI) glutamatergic neuron subtypes in MERFISH datasets. 
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(73). Finally, transcriptomic profiling of the dif- 
ferent thalamocortical projection systems in 
mice has revealed three major genetic profiles 
(primary, secondary, and tertiary) found across 
thalamic pathways and can be identified based 
on expression of TNNTI, NECABI, or CALB2 
(74). We sought to determine whether any of 
these classification schemes could be applied 
to the second-trimester human brain on the 
basis of gene expression. We first explored 
whether EN1 and EN2 could be associated 
with core or matrix identity based on expres- 
sion of PVALB and CALBI, respectively. These 
genes were detected in only a sparse number 
of cells, which suggests that these genes turn 
on later in development (fig. S3D). We next 
sought to determine whether EN1 and EN2 
were associated with transcriptional signatures 
of first-order and higher-order nuclei or with 
primary, secondary, or tertiary profiles by per- 
forming gene signature scoring using the 
UCell package (75). We identified that EN1 
was enriched for signatures of first-order 
nuclei compared with EN2 but was not en- 
riched for any of the primary, secondary, or 
tertiary profiles (Fig. 3C). However, EN1 was 
enriched for CALB2, a marker for tertiary pro- 
file neurons that relate to matrix neurons in 
intralaminar nuclei (fig. S3D) (74). By contrast, 
EN2 was enriched for signatures of higher- 
order nuclei and secondary profile neurons 
compared with EN1, including NECABI (Fig. 
3C and fig. S3D). Secondary profile neurons 
relate to matrix neurons that are enriched in 
higher-order nuclei (74). Collectively, our data 
suggest that the genetic programs underlying 
sensory and higher-order relays may begin to 
be established during prenatal development 
in humans, before the onset of sensory experi- 
ence. EN1 may be enriched for neurons asso- 
ciated with first-order nuclei and with matrix 
neurons in intralaminar nuclei based on ex- 
pression of CALB2. EN2 is associated with mat- 
rix neurons enriched in higher-order nuclei. 
We also identified six clusters correspond- 
ing to GABAergic neurons based on the ex- 
pression of GADI and SLC32A] (fig. S3D). IN1 
expressed OTX2 and SOXI4 but were mostly 
negative for NKX2.2, a marker for prosomere 
2-derived GABAergic neurons, which suggests 
that they represent midbrain-derived GABAergic 
neurons (fig. S3D) (53). These cells were addi- 
tionally enriched for OTX2-AS1, CHRM2, ZMAT4, 
and LMO] (Fig. 2B and fig. S3D). Because this 
GABAergic neuron subtype was not detected 
in our first-trimester data, we suspect that this 
population likely represents a migratory stream 
that enters the thalamus after GW10. IN3 ex- 
pressed EBFI, EBF3, ISLR2, and ESRRB, sug- 
gestive of pretectal identity (Fig. 2B and fig. 
S3D) (29, 76). IN4 expressed markers of the 
thalamic reticular nucleus including SIX3, 
ISL1, and SST (Fig. 2B) (77). The remaining 
three populations (IN2, IN5, and IN6) expressed 
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the pan-telencephalic marker FOXGI (fig. S3D) 
(45). Previous anatomical studies have sug- 
gested that in humans, GE-derived GABAergic 
neurons migrate into the thalamus (26, 27). 
FOXGI+ clusters could be distinguished on 
the basis of differential enrichment for PDZRN3 
in IN2; CRABP1, ANGPT2, and MAF in IN5; 
and PENK, RARB, and RXRG in IN6 (Fig. 2A 
and fig. S3D). IN5 is reminiscent of a recently 
described primate-specific CRABPI- and MAFI- 
positive interneuron population derived from 
the medial ganglionic eminences (MGEs) (28). 
In summary, we identified GABAergic popula- 
tions of diencephalic, midbrain, and telence- 
phalic origin, the latter two of which emerge 
during the second trimester of development in 
humans. 


Spatial registration of thalamic 
neuron diversity 


To investigate the spatial distribution of cell 
types in the developing human thalamus, we 
performed MERFISH on eight sagittal sec- 
tions in total collected from three biologically 
independent midgestation thalamus speci- 
mens using the Vizgen platform (Fig. 3A). 
These sections collectively captured the major- 
ity of thalamic nuclei as annotated by the 
Bayer and Altman atlas of the second-trimester 
human brain (Fig. 3D) (78). Our gene panel was 
designed using cluster markers from second- 
trimester scRNA-seq and snRNA-seq data, 
which allowed us to visualize glutamatergic 
and GABAergic neurons, astrocytes, oligoden- 
drocytes, microglia, endothelial cells, pericytes, 
and fibroblasts (table S3 and Figs. 3, 4, and 5). 

We first analyzed the distribution of gluta- 
matergic clusters EN1 and EN2 (Fig. 3E and 
figs. S5B, S7B, S8, and S9). We observed an 
enrichment for EN1 neurons in rostral regions 
of the thalamus, with a greater abundance of 
these cells observed in the anteroventral nu- 
cleus and anteromedial nucleus, the dorso- 
lateral nucleus, the ventral anterior nucleus, 
the paracentral nucleus, and the paraventric- 
ular nucleus (Fig. 3E). By contrast, EN2 ex- 
hibited a more caudal bias, with an enriched 
distribution in the dorsomedial nucleus, the 
ventral lateral nucleus, and the centromedian 
nucleus (Fig. 3E). Sensory nuclei, such as the 
lateral geniculate nucleus, medial geniculate 
nucleus, and ventral posterior lateral nucleus, 
contained similar abundances of both EN1 and 
EN2 populations (Fig. 3E and fig. S7B). The 
pulvinar, which represents an evolutionarily 
expanded thalamic nucleus (79), showed an 
overall enrichment for EN1 in the lateral-most 
section and an enrichment for EN2 in more 
medial sections (Fig. 3E). This difference with- 
in the pulvinar may relate to its functional 
organization, whereby the lateral pulvinar is 
involved with the visual pathway and the me- 
dial pulvinar is involved with multisensory, 
associative, and higher-order cognitive func- 
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tion (3, 80). The overall biased distribution 
of EN1 and EN2 neurons is consistent with 
the embryonic mouse thalamus (5). The epi- 
thalamus, which is a neighboring diencephalic 
region derived from prosomere 2 like the thal- 
amus, contained glutamatergic neurons asso- 
ciated with ENI signatures (Fig. 3E). This is in 
line with previously reported similarities in 
gene expression profiles between the epithal- 
amus and thalamus (29, 81). However, other 
neighboring regions, including the subthalamic 
nucleus and midbrain, contained glutamater- 
gic neurons that clustered separately from EN1 
and EN2 (EN-STN and EN-MB) (Fig. 3E). 

We next analyzed the distribution of 
GABAergic neuron subtypes revealed by our 
MERFISH data (Fig. 4A and figs. S6A, S7C, and 
$10). Midbrain-derived OTX2- and SOX14- 
positive, NKX2.2-negative cells (IN1) represented 
the most abundant population of GABAergic 
neurons in the thalamus, in line with recent 
descriptions in the mouse and marmoset (53) 
(Fig. 4, B and D, and figs. S6B and S7C). In hu- 
mans, these cells were distributed throughout 
the thalamus (Fig. 4, B and D, and figs. S6B 
and S7C), unlike in mice, where they are en- 
riched in the lateral geniculate nucleus (53). 
GABAergic neurons derived from the prethal- 
amus (IN4) and telencephalon-derived neu- 
rons (IN2, IN5) were enriched in the reticular 
nucleus and zona incerta but were also found 
sparsely distributed across the thalamic nuclei 
(Fig. 4, B and D, and fig. S7C). Abundant IN2 
population was also present in the MGE and 
basal ganglia structures that were visible in 
our sections, such as the caudate and globus 
pallidus, consistent with their telencephalic 
identity (Fig. 4B). IN5 was enriched in the 
epithalamus (fig. S7). Finally, a FOXGI-positive 
IN6 cluster was observed in the caudate but 
not the thalamus, which suggests that this 
sparse population may have been accidentally 
included in the scRNA-seq and snRNA-seq 
data generation as a result of inaccurate dis- 
section (Fig. 4B). We also could not detect IN3, 
which we hypothesized were pretectal GABAergic 
neurons, likely because of a lack of represen- 
tation of pretectum in our spatial transcrip- 
tomics datasets. 

We sought to validate the existence of 
telencephalon-derived cells in the thalamus by 
performing immunostaining against FOXGI 
in a biologically independent GW19 sample. 
Nissl staining revealed the proposed migra- 
tory stream between the MGE and the thal- 
amus previously observed in humans, the 
corpus gangliothalamicus, which is defined as 
a thin stream of cells starting from the stria 
terminalis and lining the dorsal edge of the 
thalamus (Fig. 4D). We observed that cells 
within this migratory stream stained positive 
for FOXGI1, confirming telencephalic identity 
(Fig. 4, F and G). Whereas previous studies have 
suggested that these cells migrate predominantly 
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Fig. 4. Spatially mapped GABAergic neuron subtypes across the human 
midgestation thalamus. (A) Nissl stains of sagittal sections. Sections 

are in order from left (lateral-most) to right (medial-most). Scale bars, 2 mm. 
(B) Spatial clustering analysis of MERFISH transcriptomics data depicting 
GABAergic neuron subtypes. (C) Expression of GABAergic neuron subtype 
markers across the sections. (D and E) Nissl (D) and immunostaining (E) of 
FOXG1 and TCF/L2 on a lateral section in a sagittal section from a biologically 
independent GW19 thalamus. Arrows in the Nissl stain highlight the corpus 
gangliothalamicus, which is a thin migratory stream from the MGE to the 
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thalamus. Small dotted boxes are zoomed-in images depicted on the right. 
Large dotted box is a zoomed-in image depicted in (F). Scale bars for Nissl 
and low-magnification images, 2 mm. Scale bars for high-magnification images, 
50 um. St, stria terminalis. (F) Zoomed-in image of the dorsal region of the 
thalamus section depicted in (E) to highlight FOXG1 staining within and below 
the corpus gangliothalamicus (highlighted by the arrows). Scale bar, 2 mm. 
(G) High-magnification (63X) images of regions highlighted by arrows in (F) 
highlighting FOXG1 staining within the corpus gangliothalamicus, in order from 
left to right. Scale bars, 50 um. 
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Fig. 5. Spatially mapped nonneural cell subtypes across the human mid- 
gestation thalamus. (A) Niss! stains of sagittal sections. Sections are in order 
from left (lateral-most) to right (medial-most). Scale bars, 2 mm. (B) Spatial 
clustering analysis of MERFISH transcriptomics data depicting nonneuronal 
subtypes. AC1 and AC2, astrocyte subtypes; GL, bipotent glial progenitor cells. 


to mediodorsal and pulvinar nuclei, we observe 
FOXGI+ cells sparsely throughout the thala- 
mus, with enrichment in the reticular nucleus 
(Fig. 4, F and G, and fig. S11). We additionally 
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performed in situ hybridization against FOXG1 
and CRABPI across seven sections total col- 
lected from three biologically independent 
thalamus samples. We further confirmed the 


RESEARCH | BRAIN CELL CENSUS 


Se 


PC 


@ @ opc @OL @ GL 


Ww 
r 


C 
2 
1 
0 
ISECE POL Cr KS CBLQX Wee nos CBQX wee 
PDGFRA SPARCL1 MKI67 
‘ 3 
3 3 
2 2 ' 
1 1 1 
IS SBLALE — NISGR RACK HIS CR POUL 


(C) Spatial feature plots depicting normalized expression of EP (FOXJ1), AC1 
(LGR6), and AC2 (SPARCL1) markers. (D) Relative proportion of nonneural 

cell types across lateral-to-medial sections. (E) Violin plots depicting normalized 
expression of glial, astrocytes, ependymal, and oligodendrocyte precursor 
marker genes across the two medial-most GW23 sections. 


presence of FOXGI- and CRABPI-expressing 
cells within the thalamus and epithalamus 
and enrichment in the reticular nucleus, which 
confirms our MERSCOPE results (fig. S11). 
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Spatial registration of nonneuronal diversity 
in the thalamus 

We finally assessed the distribution of non- 
neuronal cell types (Fig. 5B and fig. S7D). 
LGR6+ astrocytes (AC1) appeared to be the 
most abundant glial population in lateral sec- 
tions, whereas the bipotent glial progenitors 
were the most abundant in medial sections 
(Fig. 5, B to E), consistent with the lateral-to- 
medial maturation gradient that was previ- 
ously observed in the context of neurogenesis 
in the rodent thalamus (58, 82). SPARCLI1+ 
AC2 astrocytes were enriched in extrathalamic 
regions, such as the midbrain and reticular 
nucleus, but were also present to a lesser degree 
in the thalamus (Fig. 5, B and C, and fig. S7D). 
FOXJ1+ ependymal cells were enriched along 
the periphery of the thalamus and within the 
centromedian nucleus (Fig. 5, B and C, and fig. 
S7D). Other cell types, including dividing cells, 
oligodendrocyte progenitor cells, oligodendro- 
cytes, microglia, pericytes, and fibroblasts, were 
found distributed broadly throughout the thal- 
amus (Fig. 5, B to D). 


Thalamic nuclei exhibit differences in gene 
expression during the second trimester 


We identified several genes enriched in an- 
atomically defined thalamic nuclei. For exam- 
ple, INHBA, TLL1, ITGA8, and KCTD8 were 
enriched in the centromedian nucleus (fig. 
S14). The ventral anterior nucleus was highly 
enriched for PENK, GABRGI, TSHZ2, and CDH9 
(fig. S14). The reticular nucleus showed en- 
riched expression of CVTNAP4, EPHA5, PDEIC, 
and PVALB (fig. S14). The zona incerta was 
highly enriched for LHX5, PAX6, ISLI, and 
DLX2 (fig. S14). Given the spatial bias we ob- 
served for many neuron-enriched genes, we 
sought to determine whether thalamic nuclei 
can be molecularly distinguished in midgesta- 
tion in the human brain. We designed a gene 
panel containing 140 genes based on top 
markers for neuronal cell types and performed 
MERSCOPE on a Sagittal section from a GW21 
human thalamus. Clustering analysis revealed 
segmentation of the thalamus into regions 
reminiscent of thalamic nuclei independently 
identified by Nissl staining (fig. S14). These 
results confirm that spatial variation of neu- 
ronal gene expression parcellates the thala- 
mus into molecularly defined nuclei as early 
as the second trimester. 


Discussion 


Our study identifies the molecular trajectories 
of neuronal differentiation in the developing 
human diencephalon and suggests three prin- 
ciples of neuronal differentiation and spa- 
tial organization: (i) Human thalamic neuronal 
subtypes begin to emerge during the first tri- 
mester of development and start to organize 
into spatially distinct nuclei during the second 
trimester. (ii) Molecularly defined subtypes of 
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glutamatergic neurons are shared across tha- 
lamic nuclei, and their composition can be 
used to distinguish first-order and higher-order 
thalamic nuclei. (iii) We detect six molecular 
subtypes of GABAergic neurons that contrib- 
ute to the developing diencephalon in hu- 
mans. These include prethalamus-derived, 
telencephalon-derived, and midbrain-derived 
GABAergic neurons that together contribute 
substantially to the cellular composition of 
diencephalic subregions in humans. In partic- 
ular, we find that midbrain- and telencephalon- 
derived GABAergic neurons appear in the 
developing thalamus as early as the second 
trimester of development. 

Our study confirms previously hypothesized 
migration of telencephalon-derived GABAergic 
neurons into the thalamus, which may con- 
tribute to the expansion of this structure in 
humans (54, 55). These cells maintain expres- 
sion of FOXGI during the second trimester 
and form a thin stream along the dorsal edge 
of the thalamus that coincides with the corpus 
gangliothalamicus, a migratory pathway be- 
tween the MGE and the thalamus that was 
previously identified using Golgi and Nissl 
stains (54, 55, 83). In addition to the previously 
identified localization of these cells in medio- 
dorsal and pulvinar nuclei, our results indi- 
cate that FOXG1-positive cells appear to migrate 
throughout the thalamus, with particular en- 
richment in the reticular nucleus. We also 
observe their presence in the epithalamus, 
which suggests a greater caudal dispersion 
of these cells across prosomere 2 of the di- 
encephalon than previously appreciated. Fi- 
nally, a subset of FOXGI-expressing cells 
expresses CRABPI, a gene involved in the ret- 
inoic acid signaling pathway. CRABPI-positive 
GABAergic neurons have recently been identi- 
fied in studies of primate forebrain interneuron 
diversity (28, 32). Consistently, CRABPI-positive 
GABAergic neurons expressed LHX6 and MAF 
in our single-cell sequencing data—suggesting 
MGE origin—but did not express TAC3. 

Together, our findings underscore the com- 
plexity of neurodevelopmental cell lineage rela- 
tionships in primates and provide a cellular 
substrate that could contribute to the evolu- 
tion of the human thalamus. Additional work is 
needed to further characterize the molecular 
diversity of neural stem and progenitor cells in 
the first-trimester human diencephalon and 
molecular mechanisms underlying their fate 
specification. These include cells that give rise 
to the epithalamus and pretectum, which were 
not a focus of the current work. Furthermore, 
the ontogeny of cell types in the thalamus will 
need to be confirmed using lineage tracing or 
fate mapping tools in human and nonhuman 
primate model systems, our understanding 
of which largely comes from rodent studies. 
The developmental origins of GABAergic sub- 
types are of particular interest. The majority of 
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GABAergic neurons in the mouse thalamus 
are midbrain derived and reside in the dorsal 
lateral geniculate nucleus, but the number of 
GABAergic neurons appears to be significantly 
increased in primates, and these neurons are 
distributed throughout the thalamus. It would 
be interesting to confirm whether the most 
abundant GABAergic subtype, SOX14+ IN1, are 
truly midbrain derived, and whether FOXGI+ 
telencephalon-derived subtypes originate pri- 
marily from the MGE or also from the other 
GEs. Furthermore, comparative studies are re- 
quired to determine whether telencephalon- 
derived GABAergic neurons are specific to humans, 
as has previously been hypothesized (54, 55). 
Finally, future studies are required to determine 
whether GABAergic neuron subtypes in the hu- 
man thalamus exhibit enrichment in first- or 
higher-order nuclei and whether they exhibit 
differences in their physiological properties or 
patterns of connectivity during later time points, 
after which their migration has concluded. The 
thalamus has been increasingly implicated in 
neuropsychiatric disorders, including autism 
spectrum disorder, schizophrenia, bipolar dis- 
order, and attention-deficit hyperactivity dis- 
order (84-89). Studies of evolutionarily divergent 
cell types may provide inroads into better 
understanding human disease. 


Materials and methods 
Singe-cell data processing 


scRNA-seq FASTQs were aligned and quantified 
with STARSolo 2.7.10a with specific parameters 
for multimapping with expectation-maximization, 
intron-inclusion, cell barcode and UMI cor- 
rection: -soloCBmatchWLtype IMM_multi_ 
Nbase_pseudocounts -soloUMIdedup 1MM_ 
CR -soloFeatures GeneFull_Ex50pAS Velocyto 
-soloMultiMappers EM -clipAdapterType 
CellRanger4 -soloUMIfiltering MultiGeneUMI_ 
CR. Reference genome annotation used is a 
modified annotation provided by 10x Genomics 
(2020-A) based on removing and modifying 
ambiguous transcripts. The annotations are 
provided at https://storage.googleapis.com/ 
generecovery/human_GRCh38_optimized_ 
vl_1_velocyto_exonic.gtf.gz. For calling cells, 
we initially filtered for 500 genes expressed 
per cell followed by DropletQC thresholding 
for empty droplets based on the spliced and 
unspliced ratio from STARSolo Velocyto’s 
quantification. DropletQC rescue parameters 
were set at 0.1 for nf_rescue and 1000 for 
umi_rescue. The DropletQC filtered matrix 
was then removed for ambient RNA with 
FastCAR with recommended cutoff and 
contaminationChanceCutoff set to 0.05 and stop 
iteration at 200. The ambient RNA removed 
matrix was then used to compute doublet pre- 
diction with scds’ hybrid mode. The threshold 
for scds hybrid score was stepwise cutoffs based 
on cells called with more stringent doublet 
filtering with higher number of cells called. 


9 of 12 


Mitochondrial threshold was the final step 
for filtering at 20% maximum mitochondrial 
content. 


Single-cell analysis 


Seurat’s SCTransform v2 workflow was used 
for normalization set to 2000 most variable 
genes followed by filtering for mitochondrial, 
ribosomal genes, and MALATI. The residuals 
were input into principal components analysis 
(PCA) and batch corrected with harmony over 
20 principal components for the first trimester 
and 30 for the second trimester with theta = 2 
over the sample metadata. Uniform manifold 
approximation and projection (UMAP) dimen- 
sional reduction was done on the harmony 
corrected components with min.dist = 0.3. 
De novo clustering was done with Louvain with 
multilevel refinement with the resolution set 
0.5 for the first trimester and 0.4 for the second 
trimester. Transcription factor (TF) activity 
was computed using Dorothea and pathway 
activity with progeny. Visualization tools used 
include dittoSeq, SCpubr, and scCustomize. All 
analysis scripts are available at https://github. 
com/cnk113/thalamus-analysis. 


MERSCOPE panel design, tissue preparation, 
and imaging 


We selected a panel of 140 genes to perform 
MERSCOPE spatial transcriptomics profiling 
using Vizgen’s commercial platform. Genes 
were selected on the basis of cluster markers 
identified from scRNA-seq and snRNA-seq data, 
including canonical and novel markers against 
neuronal, glial, and nonneural cell types in the 
first- and second-trimester thalamus. The full 
gene panel is available in table S3. A second 
panel of 140 genes was designed to focus on 
markers associated with neuronal clusters in 
the second-trimester thalamus for the experi- 
ment described in fig. S14 and is also available 
in table S3. 

The panel was used to stain 10-um-thick 
sections generated from thalamus tissue (GW10, 
GWI16, GW21, GW23). Thalamus tissue was 
dissected from an intact hemisphere on the day 
of collection, flash frozen in a liquid nitrogen- 
chilled isopentane bath, and then immediately 
frozen in ice-cold optimal cutting temperature 
(OCT) compound while documenting its ori- 
entation relative to the rest of the brain. The 
tissue was serially sectioned in the sagittal ori- 
entation on a Leica CM1860 using a specimen 
head temperature of —15°C. One 100-um-thick 
section was used for generating RNA for con- 
firming a RIN score greater than seven as a 
quality control measurement. RNA was iso- 
lated using the Direct-Zol RNA purification 
kit (Zymo) according to the manufacturer’s 
recommendations. RNA integrity number (RIN) 
scores were measured using the Agilent RNA 
6000 Pico Kit. The mediolateral position of sub- 
sequent sections were gauged by performing 
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Nissl stain using the Neurotrace 500/525 dye 
(ThermoFisher) and comparing the cytoarchi- 
tecture with the Bayer and Altman Reference 
Atlas. Once we identified a suitable region of 
the thalamus for spatial profiling, we collected 
two sections on Superfrost Plus slides, one for 
Nissl staining and one for RNAscope, and col- 
lected a subsequent section for MERSCOPE on 
a glass coverslip provided by Vizgen. 

Tissue preparation and imaging was per- 
formed following Vizgen’s recommended pro- 
cedures for fresh-frozen tissue. The MERSCOPE 
slide with the tissue mounted was fixed using 
4% paraformaldehyde (PFA) for 15 min, washed 
three times in phosphate-buffered saline (PBS), 
and permeabilized overnight in 70% ethanol 
at 4°C. The next day, encoding probe hybrid- 
ization was performed for 36 to 48 hours, fol- 
lowed by gel embedding and clearing for 24 to 
96 hours following the nonresistant fresh- 
frozen tissue guidelines. Finally, the tissue was 
stained using the 4’,6-diamidino-2-phenylindole 
(DAPI) and PolyT Staining Reagent provided 
by the kit for fifteen minutes. The MERSCOPE 
instrument was configured, and the tissue was 
imaged according to the Vizgen MERSCOPE 
Instrument User Guide. 


Immunohistochemistry and fluorescence 
in situ hybridization 


Fluorescence in situ hybridization (FISH) was 
performed using the RNAscope Multiplex Fluo- 
rescent v2 Assay against human FOXGI (probe 
ID: 420991-C3) and CRABPI (probe ID: 855321- 
C2) according to the manufacturer’s protocol 
for fresh-frozen tissue. Immunohistochemistry 
was performed on 10-um-thick cryosections 
generated from GW19 thalamus tissue that 
had been fixed in 4% PFA overnight, dehydrated 
in 30% sucrose for 48 hours, OCT compound- 
embedded, and stored at —80°C. Antigen re- 
trieval was performed before immunostaining 
by incubating slides for 15 min in citrate buffer 
(pH 6) preheated to 95°C using a pressure 
cooker. Slides were washed three times in PBS 
containing 0.1% Tween-20, and simulta- 
neously permeabilized and blocked using PBS 
containing 5% bovine serum albumin (BSA) 
and 0.3% Triton-X. Slides were stained over- 
night at 4°C in blocking buffer containing 
antibodies against FOXG1 (Abcam 196868, 
1:500 dilution) and TCF7L2 (Millipore 05-511 
clone 6H5-3, 1:250 dilution). The next day, 
slides were washed three times for 10 min 
each in PBS containing 0.1% Tween-20 and 
incubated in blocking buffer containing sec- 
ondary antibodies (ThermoFisher A32766 and 
A32795, each diluted 1:1000) for 1 hour. Slides 
were washed three times in PBS, stained with 
DAPI (ThermoFisher 62247), and mounted 
using Prolong Gold (Invitrogen, P36930) and 
#1.5 coverslips (Azer Scientific, 1152460). 
Tilescan images were collected using the 
Leica Widefield microscope equipped with a 
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Hamamatsu Flash 4.0 camera using either a 
20X air objective for RNAscope or a 10X air ob- 
jective for immunostaining. High-magnification 
images were collected using a Leica SP8 con- 
focal microscope with a 63X oil objective. 


Spatial transcriptomics analysis 


Cell segmentation was performed using the 
polyT and DAPI stains. The output of Vizgen’s 
processing software with DAPI segmentation 
was used for input into Seurat’s SCTransform 
v2 normalization. The matrix was initially fil- 
tered for a minimum of 20 counts and a maxi- 
mum of 100 unique genes expressed per cell. 
Label transfer from second-trimester to spatial 
data was done using TransferAnchors and 
MapQuery on the harmony reductions with 
cluster labels. All analysis scripts are available at 
https://github.com/cnk113/thalamus-analysis. 
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EDITORIAL 


Groundhog Day 


s along-time academic and university administra- 
tor in the United States, I’ve attended more meet- 
ings than I can remember about what “counts” 
toward faculty hiring, promotion, tenure, and 
other rewards at institutions of higher education. 
These meetings always followed a similar pat- 
tern—and sadly, still do. A wise colleague referred 
to these meetings as “Groundhog Day,’ after the movie 
about a single day that repeats over and over again. The 
process of assessing faculty success has not changed in 
about 70 years, despite a lot of talk to change things. Why 
can’t universities break out of these ossified patterns? 

The public, politicians, university trustees, and many 
on the campus itself want to see various laudable ac- 
tivities added to the roster of achievements that are 
recognized for promotion, including 
innovation that leads to commer- 
cial activity, public service, and ex- 
ceptional teaching. At the meetings 
where establishing these changes is 
discussed, everyone agrees that the 
reward system should evolve in this 
way. And it is then proposed that 
these accomplishments be included 
in the requirements for promotion 
and tenure. But everyone eventually 
concurs that the faculty who vote on 
tenure and the administrators who 
approve those votes mostly ignore 
such activities, and everyone ends 
up only considering a candidate’s research, which is 
quantified primarily by one’s record of publications and 
grant awards. The latest suggestions to invigorate the 
process are the use of artificial intelligence to somehow 
analyze a candidate’s research or the establishment of 
different forms of publication as alternatives to tradi- 
tional measures of success. The smart money says these 
ideas are unlikely to change the status quo. 

Why is the bar for success set at the number of high- 
profile publications and grant awards? In the middle 
of the 20th century, government grants in the United 
States became important for research and therefore for 
career advancement at universities. In turn, peer re- 
view for grant applications and for publishing research 
gained a stronger foothold in the assessment of faculty 
quality. For the past 70 years, the criteria for success 
as a faculty member have relied on these two factors. 
Faculty dossiers have certainly gotten longer and more 
elaborate with teaching statements, patents, invention 
disclosures, descriptions of service to scientific societies 
and the university, evidence of engagement locally and 


“Why can't 
universities 


break out of 
these ossified 
patterns?” 


with the broader public, and more. But very little evi- 
dence suggests that the road to promotion is paved with 
anything other than traditional peer-reviewed publica- 
tions and grant dollars—so much so that studies of the 
impact of various factors on research success focus on 
these metrics alone. 

This resistance to change is a source of frustration 
to many who want to improve academia. The lack of 
respect for great teaching has clearly contributed to the 
erosion of public support for higher education, yet it 
has been almost impossible for universities to award 
promotion and tenure to excellent professors who don’t 
have a powerful research résumé. After passage of the 
Bayh-Dole Act in 1980, which allows commercializa- 
tion of research findings in the United States, numer- 
ous attempts were made by university 
innovators and advocates for technol- 
ogy transfer to have patents or inven- 
tion disclosures contribute to tenure 
considerations, but these have also 
largely failed. Efforts to count public 
outreach and service have also met 
with little success, although such en- 
gagement is encouraged by universi- 
ties and highlighted on their websites 
and in brochures. Many commend- 
able reforms to the system for schol- 
arly publication have also failed to 
dislodge peer-reviewed papers as the 
main indicator of research excellence. 
And in the end, getting a bunch of full professors to 
raise their hands to promote a colleague for externally 
imposed reasons runs afoul of the principles of aca- 
demic freedom and shared governance. 

If great teaching is as important as universities claim 
in mission statements, then they should reward such ex- 
cellence with career advancement. The same holds for 
faculty activities that are praised by so many, but barely 
noticed when it’s important. If these endeavors don’t 
count toward promotion and tenure, then universities 
should stop giving faculty the impression that activi- 
ties other than publishing and garnering grants matter. 
When I was a young professor, I dutifully showed up at 
a committee meeting that I had been asked to attend by 
the administration, where a grizzled veteran professor 
told me that I was wasting my time by being there in- 
stead of grinding out grants and publications. 

Maybe one day some new measure of success will 
come along that can stand up to money and papers. But 
until then, it’s still Groundhog Day. 

-H. Holden Thorp 


H. Holden Thorp 
Editor-in-Chief, 
Science journals. 
hthorp@aaas.org; 
@hholdenthorp 
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Engineer Narges Mohammadi, winner of this year’s Nobel Peace Prize, in The New York Times. 
She studied physics in Iran before her work promoting civil society, for which she is imprisoned. 


Edited by Jeffrey Brainard 


The very rare Marsdenia chirindensis, found in Zimbabwe and named in 2020, may have medicinal value. 


Most newly found plants face extinction 


ny newfound plant species should by default be considered at 

risk of extinction, scientists from the Royal Botanic Gardens, 

Kew argued this week. Biologists name more than 2200 new 

species of plants per year, on average. Yet it can take many years 

to evaluate their risk of extinction. Kew scientists proposed the 

shortcut after analyzing a list of more than 1 million plant spe- 
cies; they found that the more recently a species was named, the more 
likely it is to be listed on the International Union for Conservation of 
Nature’s Red List as facing extinction. Based on recent trends, 77% of 
species first described in 2020 are likely to be at some risk of extinc- 
tion and 24% are probably critically endangered, the team estimates 
in the State of the World’s Plant and Fungi, published this week. Only 
about one-quarter of species named before 1900 are deemed threat- 
ened. The heightened conservation risk for newly described species is 
in part because they are more likely to be rare, raising the odds that 
chance events will drive them extinct. 
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EU starts landmark carbon tariff 


CLIMATE POLICY | The European Union 
has launched the world’s first carbon tariff 
system in a bid to curb greenhouse gas 
emissions. The system, which began on 

1 October, requires foreign firms seek- 

ing to export certain products to the EU 
to calculate how much carbon is emitted 
in making the product, such as 1 ton of 
steel. The policy gives firms until 2026 to 
prepare these figures; starting then, if the 
“embedded carbon” exceeds an EU stan- 
dard, buyers must pay a tax of 20% to 35%. 
The EU is targeting steel, iron, aluminum, 
cement, fertilizer, electricity, and hydro- 
gen fuels. Climate advocates say the tariffs 
will prevent manufacturers in countries 
with weak climate policies from gaining 
an unfair trade advantage. But researchers 
warn calculating all the emissions made 
in acquiring, processing, and transporting 
materials is challenging. 


Torture-testing fusion reactor walls 


ENERGY | Workers broke ground this week 
on a facility in Granada, Spain, to simulate 
how materials are degraded by a bombard- 
ment of neutrons that may one day be 
released by a fusion power plant. Fusion 
may become a plentiful, future source 

of carbon-free power, but most schemes 
produce high-energy neutrons that over 
time can weaken the reactor’s walls and 
framework. The €700 million International 
Fusion Materials Irradiation Facility-Demo 
Oriented Neutron Source, funded by 

the European Union, will allow researchers 
to test the performance of neutron-resis- 
tant materials. The facility will generate 
neutrons by focusing an intense beam of 
deuterium nuclei on a lithium target. The 
EU hopes to use the findings to help design 
a demonstration fusion power plant gener- 
ating up to 500 megawatts by the 2050s. 


Mexico studies probe diversity 


GENOMICS | Two projects published this 
week in Nature provide new clues about the 
genetic composition of people from Mexico 
and underscore the value of diverse genetic 
data sets. Hispanic people are underrepre- 
sented in genomic databases and studies, 
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Albatrosses may use sound to catch the wind 


iven its giant size, the wandering albatross must depend on 
air currents to help it soar and save energy as its 3-meter 


wingspan Carries it over vast swaths of open ocean. Now, 
reseaigp es: say the bird may hitch a ride by listening 
juency sounds humans can’t hear—one 
so. A team of ecologists 
birds’ GPS tracks against a 


and scientists may be missing many gene 
markers that could be used to develop 
therapies. An international team led by 
researchers in Mexico compiled the Mexico 
Biobank, which contains the genomes of 
some 6000 people of diverse ancestries 
across Mexico. In studies associating 
genetic markers with complex health traits, 
the Mexico Biobank outperformed the U.K. 
Biobank for analyses of some traits, even 
though the U.K. one contains genomes 
from 83 times as many people. In the 
second project, researchers sequenced the 
genomes of almost 10,000 people in Mexico 
City and identified 31.5 million genetic 
variants that had not been identified in 
other data sets. 


Trial volunteers favor Black Pls 


HEALTH DISPARITIES | U.S. Black people 
are more likely to volunteer for clinical tri- 
als led by Black principal investigators (PIs), 
according to a controlled experiment that 
is among the first of its kind. The finding 
could help increase the below-average rate 
of participation by Black volunteers, which 
stems from factors that include systemic 
racism in U.S. health care and which results 
in fewer opportunities to test treatments 

in representative groups. A research team 
asked half of a pool of more than 300 

Black potential clinical trial participants to 
view photographs of actual Black or white 
researchers. After controlling for other 
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factors, the team found that those shown 
the portraits of Black PIs were 12.6% more 
likely to express willingness to join the 
scientist’s trial, they reported last month in 
a National Bureau of Economic Research 
working paper. The result was explained 
in part because the volunteers gave Black 
investigators higher marks in a study survey 
for trustworthiness, the team found. They 
also note the limited numbers of Black PIs, 
which could limit the application of the 
study’s conclusions to future trials. 


Gender gap studies win Nobel 


ECONOMICS | Claudia Goldin, an eco- 
nomic historian at Harvard University, 
this week won the Nobel Memorial Prize 
in Economic Sciences for her influential 
work identifying the driving forces behind 
women’s unequal participation in the 
workforce. Based on more than 200 years 
of historical data, Goldin’s research has 
drawn new insights into those unequal 
rates by quantifying the effects of chang- 
ing social norms that saw working women 
gaining social acceptance, as well as 
advances such as the birth control pill 

and the technological developments that 
gave rise to the service economy. Her work 
revealed how workplace cultural norms 
and motherhood penalties continue to 
contribute to women participating less in 
the workforce and receiving lower pay than 
men in comparable roles. Other scholars 


waves, which can voneratl Ty ts, they reBort this wees 

in the Proceedings of the National Academy of Sciences. The 
researchers say more work is needed to prove the birds 

rely on these sounds, which can travel thousands of kilometers. 
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have also studied and recorded these ten- 
dencies among women scientists. 


Bluetongue virus found in Europe 


VETERINARY SCIENCE | Bluetongue virus 
is spreading rapidly among livestock in the 
Netherlands, killing sheep and sickening 
cattle. The virus, which is spread by midges, 
does not infect humans, but concern is high 
because Dutch farms have been stricken 
with a strain for which no vaccine is com- 
mercially available in Europe. In the past 

2 weeks, the number of Dutch farms with 
infections rose from about 300 to more than 
1100; a sheep in Belgium tested positive this 
week. A previous outbreak, which spread 
across Europe in 2007, cost an estimated 
€175 million in the Netherlands alone. Its 
agriculture ministry is considering using a 
live virus vaccine from South Africa. 


Vaquita protection zone expands 


CONSERVATION | Mexico agreed last week 
to expand by more than 60% a protection 
zone in the Gulf of California for vaquitas, 
a small porpoise that is the world’s most 
endangered cetacean. The move creates 

an area in which fishing is restricted. Only 
about a dozen vaquitas were recorded in 

a census this year. Many have died from 
entanglements in illegal fishing nets used 
to catch an endangered fish, the totoaba; 
the new agreement also protects it. 
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The herbarium at the Royal Botanic Gardens, 
Kew receives some 20,000 new specimens every year. 


Plan to move Kew herbarium roils plant world 


Botanists say transferring specimens to distant site is unnecessary and will hamper research 


By Erik Stokstad 


he herbarium at Royal Botanic Gar- 
dens, Kew may be the largest and 
most significant plant collection in 
the world. It contains more than 
7 million specimens dried and pressed 
on paper sheets; laid end to end, they 
would extend three times the length of the 
United Kingdom. The research at Kew, in 
southwest London, is equally impressive, 
says Barbara Thiers, who for many years di- 
rected the herbarium at the New York Botan- 
ical Garden. “What Kew does is immensely 
important and immensely influential.” 

But controversial plans for the herbarium, 
announced in June, have left Thiers and 
many other botanists worried about the 
future of that work. Kew is planning to re- 
locate the herbarium nearly 60 kilometers 
away from the gardens and current research 
labs, a move that opponents—including 
many Kew staff—say is unnecessary and will 
hamper their research. More than 15,000 
people, including plant scientists around 
the world, have signed a petition to keep the 
herbarium in its current, historic building. 
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“There are very few people in my field who 
think this is probably a good thing,” Thiers 
says. “It might be, in the end, but the devil is 
definitely in the details.” 

Despite the opposition, last month Kew’s 
Management announced it was moving 
ahead with the plan, which it says is vital 
for housing and protecting an ever-growing 
collection of plant and fungi specimens, 
and for collaborating with other institu- 
tions. “This is a historic decision,” says 
Alexandre Antonelli, Kew’s director of sci- 
ence, who adds that the project would be 
Kew’s largest ever investment in science, if 
funding comes through. (There are no cost 
estimates yet, but the Natural History Mu- 
seum is building a £200 million facility at 
the new site.) “I’m extremely excited about 
the prospects.” 

The main structure—originally built in 
the mid-18th century and once a royal resi- 
dence—has been expanded six times since 
it was established as the herbarium in 1853. 
But its collection has continued to grow: On 
average, about 20,000 new specimens arrive 
each year. “Basically, we’re running out of 
space,” Antonelli says. 


Finding room within Kew Gardens, he 
says, proved impossible because of restric- 
tions on construction in the World Heritage 
Site. Kew’s leaders are also concerned about 
the risk of fire, like the one that devastated 
the National Museum of Brazil in 2018, as 
well as flooding from the adjacent Thames 
River. They say consultants have advised it 
isn’t feasible to bring the historic building 
up to an acceptable level of safety. 

So, management began to look elsewhere, 
and in June announced its preferred site: the 
Thames Valley Science Park (TVSP), located 
about 45 minutes’ drive away and owned by 
the University of Reading. The majority of 
the herbarium’s specimens would be moved 
to a purpose-built structure on the site, which 
would also include labs for extracting DNA 
and digital imaging and provide space for 
more than 150 researchers, including her- 
barium staff. “It’s going to be a world class 
facility,” Antonelli says. 

In August, an anonymous Kew curator 
launched the petition opposing the move, 
which also argues the current building can 
be retrofitted to safely accommodate the 
growing collection. A letter to trustees signed 
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by 170 staff scientists maintains that separat- 
ing the herbarium and its taxonomists from 
the rest of Kew would impede research. Bio- 
logists must sometimes evaluate dried speci- 
mens for features they can’t see on images, 
for instance, and taxonomists work closely 
with molecular biologists to identify plants 
and their evolutionary relationships based 
on DNA samples. Pamela Soltis, a botanist 
at the Florida Museum of Natural History, 
agrees that a split would be detrimental. “The 
quality of the science would definitely suffer.” 
Matthew Jebb, who directs the National Bo- 
tanic Gardens of Ireland and signed the peti- 
tion, calls the plan “an astonishingly foolish 
thing to do.” 

At a June meeting, 20 scientists presented 
their arguments directly to Kew trustees. 
But the trustees—who include Paul Nurse, 
director of the Francis Crick Institute, and 
two other prominent scientists—were not 
swayed. At a subsequent meeting with man- 
agement, staff were warned not to contact 
journalists or the trustees, according to the 
curator who started the petition. Some say 
they feel intimidated about protesting the 
move. A Kew spokesperson disputes this 
characterization: “Staff have been reminded 
to act respectfully and professionally.” 

But many scientists decry what they see 
as lack of transparency. “It’s outrageous 
what’s going on,” says Robert Scotland, a 
plant taxonomist at the University of Ox- 
ford. “How can you have an adult, reasoned 
discussion about this, when it’s all taking 
place behind closed doors?” Kew disagrees, 
saying: “Stakeholders have been involved in 
conversations and consultations for several 
years and continue to be.” 

Advocates of the move see long-term 
benefits. University of Oxford zoologist 
Charles Godfray, a former Kew trustee, 
hopes that Kew’s presence in the TVSP 
could even lead other institutions to move 
their botanical collections and create a 
“world-leading National Herbarium.” (The 
Natural History Museum plans to send 
28 million mammal, ivertebrate, and other 
specimens to its facility at the science 
park—though the museum’s 2 million plant 
specimens will remain at its herbarium in 
central London.) Freeing up space at Kew’s 
herbarium should also benefit the 400 or 
so staff researchers who will remain at 
Kew Gardens, creating room for new labs, 
offices, and public displays about science. 

Kew management will make a final deci- 
sion about the move in December, pending 
further risk assessments of the site and com- 
mercial negotiations. Assuming the U.K. gov- 
ernment, Kew’s primary funder, greenlights 
the project, it will take 5 to 7 years to design 
and build the new herbarium and transfer 
the collections. @ 
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Makers of tiny, tunable points of 
light win chemistry Nobel 


Researchers share award for developing quantum dots, 
which fluoresce in vivid colors determined by their size 


By Daniel Clery and Sam Kean 


hree researchers have been awarded 
this year’s Nobel Prize in Chemistry 
for work on quantum dots—tiny crys- 
tals a few dozen atomic diameters 
wide that have highly tunable optical 
and electronic properties. 

Moungi Bawendi of the Massachusetts In- 
stitute of Technology, Louis Brus of Columbia 
University, and Alexei Ekimov of Nanocrys- 
tals Technology Inc. will share a prize of 
roughly $1 million for discovering the crys- 
tals and showing how to produce them reli- 
ably. In doing so, they “planted an important 
seed for nanotechnology,’ the Royal Swedish 
Academy of Sciences said in a press release 
last week. The dots, which fluoresce in bril- 
liant colors, have found applications in tele- 
vision and computer displays, LED lighting, 
and medical imaging. Scientists now envision 
using them to create tiny lasers, improved so- 
lar cells, and quantum computers. 

The award “is a great example of funda- 
mental science connected to things where 
you already see applications from the work,” 
says Cherie Kagan, a materials engineer at 
the University of Pennsylvania who was a for- 
mer graduate student of Bawendi’. “There's a 
lot more in front of us.” 

Theorists speculated about the powers of 
such tiny structures as long ago as the 1930s, 
in the early days of quantum mechanics. The 


theory, which describes the behavior of the 
atomic world, implied that crystals a mil- 
lionth the size of a pinhead would act like 
a box, confining electrons in a way that al- 
ters their properties. A smaller box would 
compress the wavelike electrons. When 
stimulated by an external source of light, the 
electrons of a smaller quantum dot should 
emit bluer, shorter wavelength light. A larger © 
dot should emit longer wavelength yellow or 
red light. 

In the late 1970s, Ekimov, then at the 
Vavilov State Optical Institute in Russia, first 
managed to make nanometer-size crystals of 
copper chloride, embedded in glass. He con- 
firmed that dots of different size fluoresce in 
different colors. 

A few years later, Brus, then at Bell Labs, 
was looking for catalysts to capture the en- 
ergy of sunlight in a chemical reaction. When 
Brus’s team crystallized particles of cadmium 
sulfide out of a solution, they noticed that the 
larger ones reacted to light differently than 
the smaller ones and realized it was the same 
quantum phenomenon. Whereas Ekimov’s 
dots were “frozen” in glass, Brus’s dots were 
suspended in solution, which let them flow 
and made them more attractive for applica- 
tions like displays. 

Still, defects in these early quantum 
dots—especially their variable sizes—kept 
them from wide commercialization. In 1993, 
Bawendi and his team, also at Bell Labs, de- 


Louis Brus, Alexei Ekimov, and Moungi Bawendi (left to right) pioneered the development of quantum dots. 
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vised a way to make high-quality crystals of a 
well-defined size that would produce sharp, 
vivid light of one specific color. 

The process involved injecting the ingre- 
dients for quantum dots into a hot solvent 
to immediately form crystal seeds, and then 
quenching their growth by rapidly cooling 
and diluting the solvent. The solvent is slowly 
warmed up again to continue crystal growth 
in a more controlled way to make dots of a 
required size—usually 2 to 10 nanometers. “It 
was just an exciting time,” says David Norris, 
a materials engineer at ETH Zurich and a for- 
mer graduate student of Bawendi’s. 

Many companies now compete to produce 
quantum dots cheaply for different techno- 
logies, using semiconductor materials such 
as zinc selenide, cadmium selenide, or in- 
dium phosphide. The market for quantum 
dot applications in the United States alone 
reached $4 billion in 2021, and some 8% of 
the global TV market now relies on quantum 
dots to add brilliant colors. 

Researchers are looking ahead to other 
applications. In medicine, doctors want to 
use quantum dots as tissue-specific beacons 
to hunt for tumors or other problems. For 
example, quantum dots covered in organic 
materials to make them more biocompatible 
inside cells and blood could map blood ves- 
sels and lymph nodes or monitor changes in 
tumors. The dots could also help track the 
movements of drugs throughout the body. 

In solar cells, quantum dots could be tuned 
to absorb a wider spectrum of light and con- 
vert it more efficiently into electrical energy. 
Because the dots produce such specific wave- 
lengths of light, they could also act as micro- 
scopic lasers to optically shuttle information 
around computer chips, reducing heat loss 
and making chips more energy efficient. 

Another potential use is in quantum com- 
puters, which may one day outstrip even 
supercomputers for some applications. Some 
researchers are trying to manipulate the 
spins of quantum dot electrons to serve as 
the gates and switches of a quantum com- 
puter, while others hope to exploit individual 
photons produced by quantum dots for a 
light-driven quantum computer. “That’s not 
in the commercial domain yet, but that could 
be the future,” says Mark Fox, an optical phys- 
icist at the University of Sheffield. 

Somehow, the laureates’ names—a closely 
guarded secret—were leaked to Swedish 
newspapers hours before the Nobel commit- 
tee’s announcement. But Bawendi, speaking 
on the phone during the press conference, 
said he was “surprised, sleepy, shocked ... 
and very honored” by the award. He added, 
“There’s still a lot of exciting work to be 
done in this field, that’s for sure.” 


With reporting by Catherine Offord. 
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Gene-edited chickens (right) strongly—but not fully—resist flu viruses that readily infect normal chickens (left). 


INFECTIOUS DISEASE 


In ‘proof of concept, CRISPR- 
edited chickens shrug off flu 


Edits to one gene can’t completely stop infections, however 


By Jon Cohen 


or 3 decades, Helen Sang has tinkered 

with the genomes of chickens to try to 

make the birds resistant to the flu vi- 

ruses that periodically devastate flocks 

and raise fears of a human pandemic. 

Now, as an especially virulent strain of 
avian influenza sweeps through poultry and 
wild birds around the world, the geneticist at 
the University of Edinburgh’s Roslin Institute 
has her first solid success. Using the CRISPR 
gene editor and recent findings about what 
makes poultry vulnerable to flu, Sang and 
colleagues from three other institutions have 
created chickens that can resist real-life doses 
of avian flu viruses. “Sticking to it gets you 
somewhere in the end,” she says. 

The result, published this week in Na- 
ture Communications, is “a long-awaited 
achievement,” says Jiri Hejnar, a virologist 
at the Czech Academy of Sciences’s Institute 
of Molecular Genetics whose group showed 
in 2020 that CRISPR-edited chickens could 
resist a cancer-causing virus. But farmers 
won't soon be raising flu-proof chickens. 
The edited birds still became infected when 
exposed to larger amounts of the flu virus. 
And the strategy raises a safety concern: 
Chickens edited this way could drive the 
evolution of flu variants better at infecting 
people. “What this showed is a proof of con- 


cept,’ says Sang’s co-author Wendy Barclay, 
a virologist at Imperial College London. 
“But we're not there yet.” 

The researchers focused on a gene that 
Barclay’s lab showed in 2016 is key to en- 
abling avian influenza viruses to grow in 
chicken but not human cells. The gene 
codes for a protein, ANP32A, that normally 
plays a role in transcribing DNA into mes- . 
senger RNA. The chicken ANP32A has ¢ 
33 more amino acids than the human ver- 
sion, and an avian flu enzyme called poly- 
merase can co-opt it to make new virions. 

Three years later the researchers found 
chickens have a second gene for a similar 
protein, ANP32B, without this vulnerability. 
The virus can’t exploit ANP32B because it 
differs in two amino acids from ANP32A. 
So the new work used CRISPR to introduce 
those mutations into the ANP32A gene in 
chicken primordial germ cells—the precur- 
sors of eggs and sperm—paving the way to 
breeding chicks with the desired mutations. 
The altered birds appeared healthy. 

To see whether they could resist infec- 
tion, the researchers put an avian influenza 
virus into the nasal cavities of 20 2-week-old 
chicks, only half of which had the modified 
gene. All 10 wildtype birds became infected, 
but only one edited bird did. That chicken 
did not transmit the virus to others with the 
resistance gene, further work showed. 
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Wang Xiaojun, a virologist at the Chinese 
Academy of Agricultural Sciences’s Harbin 
Veterinary Research Institute, calls the re- 
sults conceptually significant. But he notes 
that influenza viruses often mutate around 
host cell restrictions. And that’s just what 
happened. When the researchers inoculated 
gene-edited chickens with 1000-fold higher 
doses of the virus, all became infected. Still, 
infections developed more slowly, reached 
lower viral levels and were far less likely to 
spread to wild-type chickens. Gene-edited 
chickens were entirely resistant to spread. 

An analysis of the viruses that grew in the 
modified birds revealed something more 
disturbing, however: mutations in their 
polymerase genes. The mutations allowed 
the enzyme to still get some help from 
the edited ANP32A protein and also from 
ANP32B and a third member of the same 
family, ANP32E, which like ANP32B typi- 
cally plays no role in influenza replication. 

Virologist Sander Herfst of Erasmus Uni- 
versity Medical Center says there is a “high 
probability” that if mutant viruses arise be- 
cause of gene edits in chickens, they will be 
better adapted to mammals as well. “A water- 
tight system where no more replication takes 
place in chickens is necessary,’ says Herfst, 
who has traced how avian flu viruses can 
evolve to transmit between mammals. 

The answer may be more CRISPR. “Ad- 
ditional genome editing of the ANP family 
could eliminate the risk of virus escape and 
adaptation,” says Jae Yong Han, a poultry 
geneticist at Seoul National University. 

The U.K. team agrees. “The major mes- 
sage of our paper is to use multiple edits,” 
Barclay says. She and co-workers showed 
in test tube experiments that even a highly 
pathogenic influenza virus could not estab- 
lish infections in chicken cells if they lacked 
all three ANP32 genes. But knocking out 
these genes would likely harm chickens’ 
development and fecundity. The challenge 
now, the team says, is to find small muta- 
tions in the other ANP32 genes that better 
ward off the viral polymerase but still allow 
the proteins to function. 

Even if genome edits can fully protect 
birds without harming their health, the birds 
will face regulatory concerns. Yet because the 
small gene edits made by CRISPR mimic mu- 
tants that already exist in nature, the regu- 
latory barriers will be lower than for earlier 
approaches that introduced new genes or 
mixed genomes of different species. 

But the CRISPRed chickens will need to 
win over consumers as well, says Roslin In- 
stitute veterinarian Alewo Idoko-Akoh, first 
author of the paper. “It’s not just enough to 
develop the technology—it’s got to be done 
in such a way that it’s culturally sensitive 
and also acceptable.” 
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COVID-19 


New Novavax vaccine OK’d: 
studies size it up against mRNA 


Limited data and lack of head-to-head studies make 
comparisons between shots tricky 


By Jennifer Couzin-Frankel 


or cardiologist Eric Topol, last week’s 
vaccine news presented a personal di- 
lemma. Topol, who directs the Scripps 
Research Translational Institute and 
is a popular commenter on COVID-19 
research, had hoped to get an up- 
dated COVID-19 vaccine from Novavax, 
rather than a messenger RNA (mRNA) shot 
from Pfizer or Moderna. Novavax relies on 
an older, protein-based approach that has 
shown long-lasting effects against other 
pathogens, and Topol wondered whether it 
might produce more durable protection. 
On 3 October, it seemed he might get his 
chance: a drugstore he visited for an mRNA 
vaccine ran out of doses, and 
hours later the U.S. Food and 
Drug Administration autho- 
rized a Novavax shot well- 
matched to current COVID-19 
variants. The green light 
marks the first time Novavax 
will be widely available to 
teens and adults. 
“Its hard to know how 
it compares” to mRNA vac- 
cines, Topol admits; there 


“Tm not sure that 
on their face 
any of the vaccines 
are particularly 
better than the 
other.” 


tection against various pathogens including 
hepatitis B and shingles, along with some 
respiratory ailments such as pneumococcal 
pneumonia. A version of the Novavax vac- 
cine targeting the original SARS-CoV-2 vari- 
ant was approved as a primary vaccination 
series and first booster in the United States 
in 2022; it also became available in Europe 
that year. Its tried-and-true technology ~ 
appealed to some people wary of the new 
mRNA approach. And unlike the more frag- 
ile mRNA shots, it lasts for months in the 
refrigerator. But uptake has been low and 
the company is banking on more shots in 
arms this fall. 

The Italian team tried to pin down how 
well the shot actually works, analyzing data 
on more than 20,000 Italians 
who had received two doses 
as their primary vaccine se- 
ries in 2022. After 4 months, 
the vaccine was 55% effec- 
tive at staving off symptoms 
from a SARS-CoV-2 infection 
and 28% effective at pre- 
venting infection altogether, 
the researchers reported in 
JAMA Network Open. That’s 
roughly comparable to how 


Kirsten Lyke, ; 
are no head-to-head studies Bas y the mRNA vaccines have 
a : University of Maryland School . 
to rely on. In clinical trials, eEMadiciné performed, Mateo Urdiales 


Novavax appeared less likely 
than mRNA shots to cause side effects like 
headache and fatigue. But how does it stack 
up against mRNA vaccines when it comes to 
protection against SARS-CoV-2? The ques- 
tion has been vexingly difficult to answer. 
Some hints are emerging, including the 
first large study of Novavax in the real 
world, published last week by a team in 
Italy. The results are far from definitive, but 
they suggest “there aren’t massive differ- 
ences” between the vaccines, says Alberto 
Mateo Urdiales, an epidemiology and infec- 
tious disease researcher at the Italian Na- 
tional Institute of Health, who led the study. 
Whereas mRNA vaccines carry instruc- 
tions for making a SARS-CoV-2 protein, 
Novavax directly delivers a version of that 
viral spike protein with an adjuvant for 
boosting immune response. Such protein 
subunit vaccines have yielded durable pro- 


says. He cautions that the 
emergence of SARS-CoV-2 variants, repeated 
boosting, and swelling numbers of infections 
make it hard to compare the Novavax num- 


bers with those from studies of other vac- ~- 


cines, however. 

Smaller studies, meanwhile, have tried to 
address another reason a Novavax booster 
appeals to people like Topol: the possibil- 
ity that “mixing and matching” various 
COVID-19 vaccines might provide better 
protection than any single vaccine brand. 
“There was theoretical hope that since these 
vaccines work in slightly different ways, they 
would have different strengths in terms of 
which part of the immune system they acti- 
vate best,’ says Angela Branche, an infectious 
disease specialist at the University of Roch- 
ester. She co-chairs a mix-and-match study 
called COVAIL that includes another protein 
subunit vaccine from the company Sanofi, 
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which is not available in the United States. 

At the University of Maryland School of 
Medicine, infectious disease specialist and 
vaccine researcher Kirsten Lyke is lead- 
ing a mix-and-match study in which 67 of 
the roughly 830 participants got the origi- 
nal Novavax vaccine as a first booster, hav- 
ing received Pfizer, Moderna, or Johnson & 
Johnson as their primary vaccination. Three 
months after that booster, their levels of neu- 
tralizing antibodies were similar to those in 
people who got an mRNA booster instead, 
the team reported in July in NPJ Vaccines. 
(Neutralizing antibodies may help protect 
against infection and illness.) 

Mixing and matching has sometimes pro- 
duced superior immune responses, both for 
COVID-19 and other vaccines. But Branche 
notes that protein and mRNA vaccines may 
be more similar, immunologically, than they 
appear at first blush, because both rely on 
the SARS-CoV-2 spike protein to trigger an 
immune response. 

Researchers also want to know how long 
protection lasts with Novavax versus mRNA 


A health care worker holds a vial of Novavax vaccine at a clinic in Italy. 


vaccines. Mateo Urdiales found that pro- 
tection from infection dropped in the first 
4 months after Novavax vaccination, but it 
seemed to hold steady against symptoms; 
other studies have shown that with mRNA 
vaccines, protection against symptoms as 
well as infection declines in that time frame. 
Lyke’s analysis hinted that levels of neu- 
tralizing antibodies waned more slowly af- 
ter a Novavax booster than after an mRNA 
booster. But that doesn’t prove Novavax’s 
protection is more durable, she stresses: No- 
vavax hit the scene much later, when many 
recipients had enhanced immunity from a 
previous infection. “This durability question 
is influenced by many different factors other 
than what the vaccine does,” Branche notes. 
Ultimately, “’m not sure that on their 
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face any of the vaccines are particularly bet- 
ter than the other,’ Lyke says. 

Comparisons of safety are uncertain, 
too. A key concern for mRNA vaccines is an 
inflammation of the heart known as myo- 
carditis, a rare side effect of vaccination 
seen most often in young men. In Novavax 
trials, there were four cases of myocarditis 
in the vaccine arm within 20 days of vac- 
cination and none in the placebo group 
during that time frame. June data from 
the Australian government noted myo- 
carditis reports for three to four of every 
100,000 Novavax doses, a rate roughly 
similar to what Australia reports for the 
mRNA vaccines. 

But because the side effect is so uncom- 
mon and Novavax has been much less widely 
used, “I don’t think anyone can know the 
true rate yet of myocarditis from Novavax,” 
says Walid Gellad, a physician who studies 
drug safety at the University of Pittsburgh. 
“I would not assume yet that Novavax is the 
solution to the myocarditis issue of mRNA” 
vaccines, he adds. 

Scientists hope more data 
on Novavax will come. Lyke is 
examining immune_ responses 
6 months and 12 months after 
the original Novavax booster in 
her small cohort. Meanwhile, a 
rare head-to-head clinical trial 
began in February in Melbourne, 
Australia. It includes almost 
500 people who already received 
three vaccine doses. Some are 
being randomized to get either a 
Moderna bivalent booster, which 
became available last fall, or the 
original Novavax vaccine. Others 
who opted against any booster 
will serve as a control group. 

“We want to determine the 
best vaccine for ongoing boost- 
ers,” says Claire von Mollendorf, 
who co-leads the study at the 
Murdoch Children’s Research Institute. She 
expects to share results from a 12-month 
follow-up of participants in early 2025. 

The best time to pit Novavax head-to-head 
against the other COVID-19 vaccines may be 
right now, with updated boosters becoming 
available that all target the same version of 
Omicron, called XBB. This fall, “I think most 
people will take what they are offered, and 
then you can compare,” Mateo Urdiales says. 
He’s hoping to launch such a study, using 
Italian registry data on vaccination. 

As for Topol, upcoming travel and uncer- 
tainty about Novavax’s availability led him to 
opt against waiting. He got a Pfizer shot at 
a nearby grocery store last week—though 
he still wonders whether and how a dose of 
Novavax might have been different. 
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Risk-taking 
health agency 


places 
first bets 


ARPA-H aims to be unlike 
NIH, director says 


By Jocelyn Kaiser 


year ago, when applied biologist 
Renee Wegrzyn took the helm of the © 
Advanced Research Projects Agency 
for Health (ARPA-H), questions 
swirled around the brand-new agency, 
which was created by President Joe 
Biden to fund daring, cutting-edge biomedi- 
cal research. Would the new organization be 
sufficiently different from the sluggish, risk- 
averse National Institutes of Health (NIH)? 
How would Wegrzyn, who had never run an 
agency, navigate pressures from Congress to 
shape the effort? And how quickly could she 
recruit the type of innovative scientific staff 
needed to identify out-of-the-box projects on 
which to spend a billion-dollar budget? 

Now, answers are starting to emerge. 
ARPA-H has in recent months announced 
a growing list of research awards for efforts 
the agency says are more ambitious, and less 
certain to succeed, than what NIH would 
typically support. Among them: a plan to re- 
generate cartilage and bone in osteoarthritis 
patients and an unprecedented effort to build 
a functioning heart using 3D printing with 
living cells. The agency has hired 390 staffers, 
some of whom will take an unusually active 
role in shaping research without the outside 
peer-review NIH projects typically get. By the ~° 
end of September, ARPA-H had already tenta- 
tively obligated close to $1 billion of its initial 
$2.5 billion budget. 

“Nothing like this has ever existed inside 
the health ecosystem,” Wegrzyn told Science 
in an interview last week at ARPA-H’s current 
offices, a suite of rented rooms in Arlington, 
Virginia. “This is a place where we take very 
big risks that NIH can’t take.” 

The fierce debate over whether ARPA- 
H is needed now awaits the results of the 
gambles placed by Wegrzyn and her staff. 
Onlookers say although the agency’s start has 
been slower than hoped, they are generally 
pleased with its initial slate of projects. “The 
investments that they’re making look pretty 
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Renee Wegrzyn is drawing on her DARPA experience to guide a health-focused agency with a similar philosophy. 


fantastic,” says consultant Michael Stebbins, 
a geneticist and former White House science 
staffer who pushed for ARPA-H’s creation. 

“They have launched some novel and ex- 
citing projects” that are “complementary” 
to the work of NIH, adds Ellen Sigal, CEO 
of Friends of Cancer Research, a research 
advocacy group. 

Pitched by the Biden administration in 
2021 as a $6.5 billion agency, ARPA-H is 
meant to be the biomedical equivalent of the 
Defense Advanced Research Projects Agency 
(DARPA), known for developing GPS and the 
Internet. As at DARPA, a small group of pro- 
gram managers will have considerable free- 
dom to choose many of ARPA-H’s efforts and 
specific awardees. 

ARPA-H’s gestation was contentious. Some 
research groups worried the new agency 
would siphon funds from NIH, which has 
its own “high-risk, high-reward” programs. 
Other observers doubted it would have the 
autonomy it needed to succeed. Some said 
it should be completely independent from 
NIH, whereas others argued that an NIH 
link would provide key expertise and infra- 
structure. After Congress created ARPA-H 
in March 2022, Department of Health and 
Human Services Secretary Xavier Becerra 
ultimately placed the agency under NIH, but 
Wegrzyn reports directly to him. Congress 
also said ARPA-H can’t be physically on the 
NIH campus in Bethesda, Maryland, which 
helps explain why the agency is currently 
working out of a WeWork suite in Virginia. 
(The agency will announce permanent head- 
quarters in the Washington, D.C., area soon.) 

Weegrzyn is a former program manager 
in DARPA’ biology division who also spent 
time at the synthetic biology company 
Ginkgo Bioworks. Less than 6 months after 
she took the helm, ARPA-H released a call 
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for proposals. Unlike NIH, it didn’t initially 
ask for long applications full of support- 
ing data—just 3 pages outlining a vision 
or major goal and how to achieve it. It 
drew a “large volume” of such applications, 
Wegrzyn says, and invited select groups to 
submit fuller proposals. The agency has so 
far funded seven projects, mostly with aca- 
demic teams, from the printed heart to a 
cancer treatment that relies on bacteria to 
better diagnostic tests to help combat anti- 
biotic resistance. It is also making a push in 
medical information systems, with awards 
planned for developing ways to pool data 
from across health systems. 

Several initial recipients are former 
DARPA awardees; that’s not surprising, says 
Brad Ringeisen, former director of DARPA‘’s 
biotechnology division and now at the Uni- 
versity of California, Berkeley. “I think it 
makes sense to leverage some of those past 
performers,” he says. In contrast to DARPA 
research, which is meant to help the military 
or its soldiers, ARPA-H’s projects aim to im- 
prove health of all Americans. 

ARPA-H is not supposed to focus on spe- 
cific diseases—that’s NIH’s remit. Some of 
the agency’s first projects do focus on can- 
cer and fall under Biden’s reignited Moon- 
shot, a plan to slash deaths from the disease 
in half by 2047. But Wegrzyn points out 
that the cancer efforts really aim to develop 
“platforms,” or technologies, that can also 
help treat other health problems such as 
autoimmune diseases. 

If APRA-H successfully follows its name- 
sake’s history, the backbone of the agency 
will be its “programs,” led by a term-limited 
manager who conceives an idea, then as- 
sembles an outside team of scientists to 
carry it out with funding that might top 
$100 million or more. ARPA-H has an- 


nounced just four programs so far. They in- 
clude an osteoarthritis effort to regenerate 
tissues, a precision surgery initiative, and a 
scheme to implant cell-loaded devices in the 
body that will deliver medicines and serve 
as disease sentinels. The fourth program, on 
computational strategies to design vaccines 
that protect against many strains of a virus, 
was announced this week. 

The agency is also forming what it calls “a 
hub-and-spoke health innovation network,” 
dubbed ARPANET-H, to help coordinate ef- 
forts such as clinical trials or tests of devices 
in community health centers or rural hospi- 
tals. Staff will be based in three centers: the 
agency’ Washington, D.C.-area headquar- 
ters; a Boston area “investor catalyst” hub 
aiming to get ideas to the market quickly; 
and a center in Dallas focused on “customer 
experiences’—such as diversifying the agen- 
cy’s clinical trial participants. But ARPA-H ~ 
managers—aware of lawmakers’ desire to 
have broad geographic reach—say the net- 
work will extend into all 50 states. 

APRA-H staff have fanned out across the 
country to hold “proposers days,” where they 
explain to academic researchers how its 
model differs from NIH’s. Teams selected by 
ARPA-H, funded with contracts rather than 
grants, will work closely with the agency and 
could have awards pulled if they don’t meet 
milestones. “It’s a different way of thinking 
and it’s not for everybody,’ says Ringeisen, 
who estimates only 10% to 20% of academic 
researchers will find it appealing. 

Under instructions from Congress, which 
is closely following the new agency, Wegrzyn 
must provide quarterly updates to law- 
makers. She says that part of her mission is 
to explain that aiming high means “we will 
fail sometimes.” And she notes it could be 
10 to 15 years before program managers’ bets 
pay off with health benefits for the country. 
“There’s so much excitement and promise 
around ARPA-H,” but also a need to “set ex- 
pectations,” she says. 

Weegrzyn’s chief goal in her second year 
is to keep hiring program managers, whom °- 
she sees as the most important element of 
the agency. She hopes to have 20 on board 
by the end of the year and at least double 
that number in 2024. What happens after 
that will depend on future budgets, and it’s 
unclear whether Congress will even give 
ARPA-H a raise in 2024. (A Senate bill keeps 
the agency’s budget flat at another $1.5 bil- 
lion, whereas the House of Representatives 
wants to cut it to $500 million; meanwhile 
the agency has 2 more years to spend its cur- 
rent funding). 

“T don’t have a crystal ball” to predict the 
agency's future budget, Wegrzyn says. But for 
now, “launching these programs is what I’m 
super, superexcited about.” 
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Mexican biologist Rodrigo Medellin Legorreta 
is fighting political headwinds to.preserve his 
country’s natural heritage—and his.own legacy 


By Richard Stone and Rodrigo Pérez Ortega, in the Calakmul Biosphere Reserve on Mexico's Yucatan Peninsula 


he mother bat screeches furiously 
as Rodrigo Medellin Legorreta 
grips her, a thumb under her chin 
to prevent her razor-sharp teeth 
from sinking into his gloved hand. 
“These creatures are feisty. They 
are witty,’ he says. “And when 
they do manage to bite you, you 
scream, and you curse.” With the 
creature subdued, he gently pries a furry 
bulge from her chest. He hands the baby 
woolly false vampire bat (Chrotopterus 
auritus) to Angel Torres Alcantara, an 
undergraduate on his first field trip to 
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El Hormiguero, a Maya temple ruin at 
the base of the Yucatan Peninsula in south- 
eastern Mexico. 

On a broiling day in late June, 
Torres Alcantara and two other students 
in Medellin Legorreta’s research group at 
the National Autonomous University of 
Mexico (UNAM) set to work weighing and 
measuring the five bats they captured from 
a roost inside a temple chamber. Torres 
Alcantara spreads one of the baby bat’s 
wings on a rubber mat as a Ph.D. student, 
Monica Izquierdo Suzan, punches out a 
snippet of skin for DNA analysis. A more 


experienced Ph.D. student, Javier Torres 
Cervantes, deftly inserts a tiny radio tran- 
sponder under the bat’s scapula for iden- 
tification the next time it’s captured. The 
aim is to understand how this carnivorous 
species—the second largest bat in North 
America—is coping with habitat frag- 
mentation and climate change. “They’re 
good indicators of the state of the forest,” 
Medellin Legorreta says. 

Medellin Legorreta doesn’t just study 
bats; he fights for them. The 65-year-old 
conservation biologist may be best known 
among peers for helping bring the Mexican 
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lesser long-nosed bat (Leptonycteris yerba- 
buenae) back from the brink of extinction. 
Also known as the tequila bat because it of- 
ten feeds on the nectar of the agave plants 
used to make that spirit, it became the first 
mammal delisted in Mexico, thanks to a re- 
covery plan he devised. 

But Medellin Legorreta has ranged well 
beyond bats: spearheading programs to 
protect wintering grounds for monarch 
butterflies, for instance, and to secure 
habitat corridors for jaguars. “He’s one of 
the most well-rounded biologists in Mex- 
ico,” says Hesiquio Benitez Diaz, a biologist 
with Mexico’s National Commission for the 
Knowledge and Use of Biodiversity (CONA- 
BIO). In 2019, the National Geographic 
Society anointed Medellin Legorreta its 
seventh explorer-at-large, putting him 
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in the company of marine biologist Sylvia 
Earle, who took the deepest untethered sea 
walk, and oceanographer Robert Ballard, 
who found the wreck of the Titanic. 

Lately, Medellin Legorreta’s activism has 
put him on a collision course with Mexico’s 
government and president, Andrés Manuel 
Lopez Obrador. “In my lifetime, I haven’t 
seen an administration with less inter- 
est and less concern for the environment 
than this one,” he fumes. In 2020, Lopez 
Obrador’s administration 
drained CONABIO’s_ budget, 
sapping the agency’s power 
to guide conservation projects 
and policies. Then, in March, 
the government’s plan to save 
the vaquita, a critically endan- 
gered porpoise, was deemed 
so woeful by the Secretariat 
for the Convention on Inter- 
national Trade in Endangered 
Species of Wild Fauna and 
Flora (CITES) that it levied 
massive sanctions on Mexico’s 
wildlife trade. In response to 
international pressure, Mexico last week 
announced an expanded protection area 
for the vaquita. 

Medellin Legorreta’s personal béte noire 
is the Maya Train, a $30 billion rail line 
and tourism project in the Yucatan Pen- 
insula championed by Lopez Obrador as 
a bonanza for an impoverished region. 
Medellin Legorreta and others contend 
that the project, now under construction, 
is fragmenting rainforest habitat and will 
harm scores of species (Science, 21 January 
2022, p. 250). 

“It’s tough to be a conservation biologist 
in Mexico these days,” Medellin Legorreta 
says. A longtime member of Mexico’s dele- 
gation to CITES, he was tossed off it in De- 
cember 2022 in retribution, he believes, for 
his unabashed critiques of the Maya Train. 
He had served on the delegation’s animals 
committee since 2000, guiding govern- 
mental actions to protect mako sharks, 
iguanas, and Mexican crocodiles. Last year, 
he says, his cellphone was targeted with 
Pegasus, software the Mexican government 
has used to spy on journalists, human 
rights activists, and dissidents. “He’s one of 
the few that has taken a risk by raising his 
voice,” says conservation biologist Valeria 
Towns Alonso, a former student of Medellin 
Legorreta’s who is now with Pronatura 
Noroeste, a nongovernmental organization. 
“He’s a man of principles.” 

Despite the political headwinds, Medellin 
Legorreta hasn’t given up on his homeland. 
“We'll save as much of our biodiversity as 
we possibly can,” he says. “We won't go 
down without a fight.” 


“He's one 
of the most 
well-rounded 
biologists 
in Mexico.” 


Hesiquio Benitez Diaz, 
National Commission 
for the Knowledge 
and Use of Biodiversity 


MEDELLIN LEGORRETA’S PASSION for wild 
things was kindled at a very young age. 
“I was an odd little kid,” he says. “My first 
word wasn’t ‘mama. It was ‘flamingo!’” 
He memorized every scrap of information 
about animals he could lay hands on. By 
age 12, he’d become infatuated with the 
idea of appearing on a popular TV game 
show called El Gran Premio. His mom per- 
suaded a show producer to give Medellin 
Legorreta a tryout and in 1970, he became 
the first child to appear on the 
Saturday evening quiz show, 
which offered cash prizes. He 
made it to 32,000 pesos (then 
about $2500) before tripping 
up on a complex question 
about mammal classification. 

One person who happened 
to tune in that evening was 
Bernardo Villa Ramirez, the ~ 
founder of mammalogy in Mex- 
ico and a professor at UNAM, 
Mexico’s most prestigious uni- 
versity. Villa Ramirez tracked 
down the precocious kid and 
invited him to hang out with working bio- 
logists. One afternoon, another UNAM 
mammologist handed Medellin Legorreta 
a Waterhouse’s leaf-nosed bat (Vacrotus 
waterhousit). “It shook me inside,” he re- 
calls. “This is exactly the moment when I 
felt, ‘I’m never looking back. This is what 
I’m here for.” As he drifted off to sleep each 
night, he’d recite in his mind the ABCs of 
bat genera. “A” for Artibeus (a genus of 
fruit bats). “B” for Bawerus (a single-spe- 
cies genus, Van Gelder’s bat), and so forth. 

Medellin Legorreta turned the room 
he shared at home with an older brother, 
Mario, into a menagerie. He kept a kin- 
kajou, a cat-size mammal from the rain- 
forest also known as the honey bear, as 
well as less cuddly companions—includ- 
ing a rattlesnake curled up near the door. 
“Mario was not happy about that.” (He en- 
dured and went on to be a popular singer 
of romantic ballads.) 

Then there were the 10 common vampire 
bats (Desmodus rotundus) that Medellin 
Legorreta, then 14, captured south of Mex- 
ico City and brought back to the family 
bathroom. He fed them cows’ blood from 
a farm run by UNAM’s veterinary school, 
which he kept in ice cube trays in his fam- 
ily’s freezer. He thawed one cube for each 
bat every evening. The bats were messy 
eaters. “It was like a Hitchcock movie!” 
Medellin Legorreta recalls. 

His parents were too distracted to put up 
a fuss, Medellin Legorreta says. His father 
had his hands full running an ice cream 
factory, and his mother was a professional 
opera singer. The youngest of five, Medellin 
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Legorreta was essentially raised by his 
sister Enriqueta, who was 9 years older. 
Enriqueta, later a UNAM-trained surgeon 
and environmental activist, indulged her 
like-minded little brother, buying him 
his first microscope and shuttling him 
to UNAM. 

Although his bat obsession distracted 
Medellin Legorreta from his schoolwork, 
he eventually buckled down and earned a 
biology degree from UNAM. One of his ear- 
liest papers was on the predatory behav- 
ior of woolly false vampire bats, which he 
kept at home and fed, sustaining them on 
live mice. “I would fall asleep to the sweet 
sound of bones being crushed,” he says 
with a smirk. “This was the level of sick- 
ness in my mind.” 


DRIVING WEST FROM El Hormiguero toward 
the Calakmul Biosphere Reserve, the road 
wends through ceibas, strangler figs, and 


sapodillas, a tree prized since Maya times 
for its chicle gum. Ocelots and jaguars 
prowl the forest, and spider monkeys and 
black howlers agitate the canopy. Then the 
road crosses a gash, 100 meters or so wide, 
extending to the horizon in both directions: 
the future route of the Maya Train. “The 
wound in the forest is very long and very 
deep,” Medellin Legorreta says. “They’re de- 
stroying a vast area of primary rainforest. 
And for what?” 

For economic growth, according to 
Lopez Obrador. The Maya Train, named 
after the region’s Indigenous people, 
aims to transport more than 40,000 pas- 
sengers daily across 1500 kilometers. All 
told, the United Nations Human Settle- 
ments Programme estimates, the project 
will create upward of 1 million jobs. But 
they’ll come at a cost. In 2019, Mexico’s Na- 
tional Council on Science and Technology 
warned that the train would threaten at 


least 10 protected natural areas and nearly 
1300 archaeological sites. 

Medellin Legorreta has won a few con- 
cessions from Maya Train planners. For 
instance, the original route through the 
Calakmul reserve crossed one end of a vast 
cavern, dubbed “the bat volcano,” that’s 
home to about 3 million bats of eight spe- 
cies. Every evening at dusk, most of the 
resident bats emerge from the cavern’s 
mouth in a tornadolike swirl and fan out 
in search of prey. Alarmed that the rail 
line’s construction could partially collapse 
the cave, “I started bitching and fighting,” 
Medellin Legorreta says. Project managers 
planned to shift the route north—but that 
would have taken it through prime jaguar 
habitat, Medellin Legorreta says. He com- 
plained again, and now the train will pass 
2 kilometers south of the cave. 

Another casualty of L6pez Obrador’s ~ 
disregard for environmental protections 


Bat territory 
Bats are the focus of Rodrigo Medellin Legorreta’s research and am v 
conservation efforts. He helped bring the lesser long-nosed bat, or Bats drawn  Leptonycteris Desmodus Chrotopterus Vampyrum 
tequila bat, back from the brink of extinction. Now, the conservation ditcaais yerbabuenae rotundus auritus spectrum 
biologist is puzzling out how North America’s two biggest bats—the 
woolly false vampire bat and the spectral bat—manage to coexist on eat >... 8 ses a teseuhe psec eetsioctisenies e eee 
the Yucatan Peninsula and how they will cope with climate change Common Lesser Common Woolly false Spectral 
and forest fragmentation due to the Maya Train. name long-nosed bat vampire bat vampire bat bat 
Food Agave nectar Mammalian Insects and small Insects and small 
sources and cactus fruits blood vertebrates vertebrates 
Mass 15-25 grams 25-40 grams 75-96 grams 134-189 grams 
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Mexico's government touts the Maya Train as a bonanza for the Yucatan Peninsula. Rodrigo Medellin Legorreta and others contend it is fragmenting habitat and harming species. 


is CONABIO. The agency, founded in 1992, 
maintains rich databases on species abun- 
dance and distribution. It once wielded 
outsize influence on biodiversity policy. 
But last year, after cutting its budget, 
Lopez Obrador’s administration stripped 
the impoverished agency of its autonomy 
and took control of it. “For this adminis- 
tration, ecology is a nuisance, nature is a 
nuisance, the knowledge and protection 
of biological diversity is a nuisance,” la- 
ments ecologist José Sarukhan Kermez, 
who resigned as CONABIO’s head last year. 
CONABIO “was such a positive shining 
light,” Medellin Legorreta says. “Now, it’s 
hopelessly degraded.” 

Lo6pez Obrador has dismissed Medellin 
Legorreta and other critics as “pseudo- 
environmentalists.” That jab reso- 
nates with some academics who 
are uncomfortable with Medel- 
lin Legorreta’s celebrity status; 
environmental luminary David 
Attenborough dubbed him “the 
bat man of Mexico” in a 2014 
documentary. Colleagues either 
love him or hate him, Benitez 
Diaz says. “He’s controversial, he’s 
explosive, he’s very passionate,” 
he says. “He’s like a rock star, so 
he generates envy in the scientific 
community,” adds UNAM biologist 
Luis Zambrano Gonzalez. 

Zambrano Gonzalez, too, has 
been an outspoken critic of the 
Maya Train, and he contends that 
even though he and Medellin 
Legorreta have failed to stop the 
project, “we are winning by los- 
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ing.” Many Mexicans, he says, now have a 
better understanding of the Yucatan’s biodi- 
versity and why it should be preserved. 

Despite this reputation, Medellin Legorreta 
prefers collaboration over confrontation. One 
big success has been working with tequila 
and mezcal distillers to conserve the tequila 
bat, which had landed on Mexico’s endan- 
gered species list after surveys in the 1980s 
found only a handful left in areas that once 
were home to thousands. 

The tequila bat feeds on the flowers of the 
agave—a kind of succulent—and pollinates 
them. Wild agaves stockpile sugars for de- 
cades before depleting their energy reserves 
in a Single flowering event before they die. Te- 
quila and mezcal distillers harvest the plants 
before they flower, converting the ample 


The “bat volcano,’ a cave from which millions of bats emerge every 
evening at dusk, was threatened by the Maya Train until it was rerouted. 


sugars into alcohol—thus depriving tequila 
bats of the nectar they rely on. Many distill- 
ers “have completely forgotten their partners, 
the bats,” Medellin Legorreta says. By culti- 
vating the shoots of mature plants instead 
of allowing them to flower and propagate on 
their own, they are risking their own liveli- 
hoods: The shoots are genetic copies of the 
parent plant, so over time the agave’s genetic 
diversity has shriveled. A 2001 study found 
that tens of millions of agave plants in central 
Mexico were clones of just a handful of indi- 
viduals. Agave clones are more vulnerable to 
disease; in a 2011 die-off, about 40% of distill- 
ers’ agave fields were blighted. 

In 2014, Medellin Legorreta persuaded 
seven producers in Michoacan state to al- 
low 5% of their agave fields to flower, en- 
abling tequila bats—which fly up to 
100 kilometers in a single night—to 
pollinate the plants and boost their 
genetic diversity in the process. 
About 300,000 bottles a year of te- 
quila and mezcal now bear a “bat- 
friendly” label. 

Medellin Legorreta also relies on 
homegrown allies: a legion of for- 
mer students, some of whom now 
workin government. In arare prom- 
ising development on Mexico’s 
environmental frontlines, the Na- 
tional Commission of Natural Pro- 
tected Areas is drafting plans to 
create 200 protected areas across 
the country. The head of its prior- 
ity species division, José Eduardo 
Ponce Guevara, is a former stu- 
dent, and mentor and mentee are 
working together to ensure that 
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As student Javier Torres Cervantes looks on, Rodrigo Medellin Legorreta examines a woolly false vampire bat that roosts in a Maya temple at the El Hormiguero site. 


the sites have robust management plans. 

But Medellin Legorreta can’t contain a 
few criticisms, saying some of the proposed 
protected areas “are absurd.” One would 
encompass the entire upper Gulf of Cali- 
fornia, too big to be adequately protected, 
he says. At the other extreme is a proposed 
5000-hectare national park for jaguars in the 
northern Yucatan. “That’s like one-tenth the 
range of a single male jaguar.’ 

As Medellin Legorreta battles to 
safeguard Mexico's biodiversity from 
domestic threats, he is forging new 
alliances abroad. In January 2020, 
he founded a research and conser- 
vation network called Global South 
Bats, and in the coming weeks he 
will launch Global South Cats. The 
initiative will bring together conser- 
vation biologists from several coun- 
tries, who have struggled to work 
together in the past. “A big problem 
we have in carnivore conservation 
is that everyone is very territorial, 
and everyone has huge egos,” says 
Shivani Bhalla, executive director of 
Ewaso Lions, a nonprofit in Africa. 
“With Global South Cats, you’re ba- 
sically putting away your egos and 
really committing to a united effort.” 
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The project aims to reduce conflicts be- 
tween big cats and people by preventing the 
felines from preying on livestock and pets. 
Medellin Legorreta says it was inspired by 
an innovative rescue of Australia’s northern 
quoll. This kitten-size marsupial had devel- 
oped a taste for cane toads, imported from 
Central America decades ago to control bee- 
tles that were decimating sugarcane. But the 


“When they do manage to bite you, you scream, and you curse,” Rodrigo 
Medellin Legorreta says of the woolly false vampire bat. 


toads produce a toxin that was killing quolls. 
About 15 years ago, Australian scientists de- 
signed a kind of aversion therapy. They in- 
jected cane toads too small to kill a quoll 
with thiabendazole, an antiparasitic agent 
that induces nausea, and fed them to captive 
juvenile quolls. The nausea left an impres- 
sion: When released into the wild, the quolls 
were more likely to avoid bigger, lethal toads. 
“Think of it like eating a bad 
shrimp. Just the idea of eating an- 
other shrimp is really repulsive,” 
Medellin Legorreta says. 

To see whether jaguars could °- 
be similarly duped, he and 
wildlife veterinarian Ivonne 
Cassaigne, a former student of his 
now with the nonprofit Primero 
Conservation, started a few years 
ago with a jaguar near Cancun, 
Mexico. It was attacking dogs at 
a local dump. Cassaigne “waited 
until the jaguar killed a particu- 
larly large dog that it didn’t finish 
eating,” Medellin Legorreta says. 
She then spiked the carcass with 
thiabendazole. The next night, 
the jaguar finished the dog—and 
as far as they could tell, never 
harmed another one. They have 
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Ancient Maya temples, like this one at a site called El Hormiguero, are perfect roosts for bats, providing shelter, warmth, and darkness. 


since trained five other jaguars to avoid 
cattle, goats, and sheep. 

The strategy “has huge promise for big 
cats,” says Natalie Schmitt, a conservation 
geneticist at McMaster University who 
studies snow leopards in Central Asia. She 
has joined forces with Medellin Legorreta 
and Global South Cats, which will soon 
trial taste aversion in jaguars across Latin 
America, leopards in Africa, and tigers in 
Central Asia. 


UNDER A STARRY SKY in the Calakmul re- 
serve, aS unseen frogs chirp and insects 
trill, Medellin Legorreta patiently works 
to untangle a bat caught in a mist net. Lit 
by a headlamp, the hamster-size bat has 
a tuft of bristly hair under its nose and 
tiny eyes tucked close to its ears. Although 
this Wagner’s mustached bat (Pteronotus 
mesoamericanus) isn’t putting up a fight, 
Medellin Legorreta winces. “Arthritis in my 
right hand,” he says. “My wife tells me, ‘You’ve 
taken too many bats out of nets.” At least 
10,000, he estimates. 

Tonight, Medellin Legorreta’s quarry is 
North America’s largest bat: the spectral bat 
(Vampyrum spectrum), which has a wing- 
span of up to 1 meter. “Vampyrum are really 
rare. We hardly know anything about them,” 
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he says. Unlike many bat species, Vampy- 
rum bats form monogamous pairs and live 
with their offspring in trees. In 2014, he em- 
barked on a long-term study to learn more 
about their presumed range in the southern 
Yucatan. He offered a $1000 reward to any- 
one who could point him to a roost. “Four 
months later, I had five roosts and a big hole 
in my pocket.” 

Ever since, Medellin Legorreta and his 
team have been tagging and tracking Vampy- 
rum bats and woolly false vampire bats, 
which roam the same woodlands in this cor- 
ner of the Yucatan. “I want to know exactly 
what allows these two species to coexist,’ he 
says. It may come down to differing hunt- 
ing styles. Vampyrum bats prey mostly on 
birds they find by smell, whereas the woollies 
swoop down on rodents they hear rustling 
on the forest floor. But Medellin Legorreta 
worries the two heavyweights may come to 
blows as forest fragmentation worsens and 
climate change brings further warming and 
prolonged droughts. “What would happen if 
there’s a shortage of roosting sites?” 

As Medellin Legorreta and his students 
wait, in vain, for a Vampyrum to snag in a 
mist net tonight, he peppers his students 
with pop quizzes and questions about their 
research plan. “I’m an intellectual vampire,” 


he says. “I thrive on discussing ideas with 
these kids.” The banter is laced with humor, 
but his tone sharpens when a student blanks 
on a species name, or is too clumsy freeing a 
bat from a mist net. “If you come to him with 
a research problem, you have to bring at least 
one possible solution. He’s not going to solve 
your problems for you,” Izquierdo Suzan Says. 

Medellin Legorreta is counting on these 
disciples to take up his mantle. “It’s late 
in my life and late in my career. I need to 
be very strategic about what I want to ac- 
complish and what I want to leave behind 
as a legacy.” Undoing the damage wrought 
by Lopez Obrador’s administration, 
Medellin Legorreta says, could begin as 
early as December 2024, when the next 
president is inaugurated. (Mexico’s presi- 
dents are limited to a single 6-year term.) 
Revitalizing CONABIO will then become a 
top priority, he says. 

But that looming struggle seems far from 
Medellin Legorreta’s mind at this moment in 
the enchanting rainforest of Calakmul. “My 
entire life is a dream,” he says, eyes closed. 
“T have to tell you, I’m one of the happiest 
people I know.” His eyes snap open. “Inside, 
I’m still that 12-year-old boy holding a bat for 
the first time,” he says. “There are so many 
things I want to learn.” & 
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Photocatalysis gets energized 
by abundant metals 


Cobalt compounds might enable cheaper 
and more-complex photocatalysis processes 


By Polina Yaltseva and Oliver S. Wenger 


any chemical reactions that use 
light require catalysts that include 
ruthenium or iridium, which are 
among the scarcest and most pre- 
cious elements of the periodic table. 
By contrast, there is substantial in- 
terest in compounds made up of abundant 
metals, such as iron and cobalt, which can be 
activated by light to form energetic (excited) 
states. In most cases, the excited-state life- 
times of compounds that contain abundant 
metals are shorter than those of compounds 
that contain precious metals, which limits 
their reactivity in photocatalysis. On page 191 
of this issue, Chan e¢ al. (7) reveal a counter- 
intuitive relationship between the lifetimes 
and energies of electronically excited states 
in iron(II) and cobalt(III) complexes. The ob- 
servations follow Marcus theory, a framework 
that explains electron transfer reactions. This 
finding opens the door to the fabrication of 
cheaper catalysts which are needed, for ex- 
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ample, for the synthesis of medicinal agents 
and in “green” hydrogen generation. 

When molecular compounds of iron(II) 
are illuminated with visible light, the result- 
ing excited state can subsequently relax by 
nonradiative processes—those that do not 
involve light emission. Usually, the electroni- 
cally excited state is located on the metal- 
cation component of the compound, and this 
metal-centered (MC) state stores a relatively 
small amount of energy, only 0.5 eV or less, 
because MC states have already undergone 
energy loss (relaxation) (2). By comparison, 
the photoactive excited states of precious 
metal compounds often store 2.0 eV or more. 
For applications of iron(II) complexes in so- 
lar light harvesting—such as, in water split- 
ting, dye-sensitized solar cells, or photoca- 
talysis—it is desirable to increase both the 
amount of energy stored in the MC excited 
states and the lifetimes of these states. 

Experiments that used compounds with 
different ligands bound to the iron(II) cat- 
ion revealed that the higher the MC excited- 
state energies get, the faster the MC excited 
state decays (1-3). This is a dilemma, because 
higher energies allow the compound to cata- 


(. 


lyze a greater scope of potentially valu Sree 


photoreactions, yet, at the same time, long — 
MC-state lifetimes are needed for the excita- 
tion energy to be transferred to a reaction 
partner in solution reactions. If the excited 
state decays too quickly, encounters by dif- 
fusion between the metal compound and re- 
actant molecules cannot occur, and thus the 
photochemical reactivity is shut down. A few 
nanoseconds are usually enough for produc- 
tive diffusional encounters to occur, but the 
lifetimes of MC excited states of iron(II) com- 
pounds are often near this time limit (2, 3). 

By turning from iron(II) to cobalt(II), this 
dilemma has now been overcome by Chan et 
al. Because cobalt(III) and iron(II) have the 
same electronic configuration, compounds 
with these ions can have the same MC 
state. However, the higher oxidation state 
of cobalt(III) increases the MC-state ener- 
gies relative to those of iron(II) (4, 5). Chan 
et al. found that the increase in excited-state 
energy in cobalt(III) comes with an increase 
in its excited-state lifetime, in contrast to the 
behavior previously found for the iron(II) 
compounds. The overall trend observed by 
Chan et al. when going from iron(II) com- ¢ 
pounds with low MC-state energies to the 
high MC-state energies of the cobalt(III) 
compounds can be understood in the frame- 
work of Marcus theory. Originally formu- < 
lated to describe how the rates of electron 
transfer depend on the free energy of elec- 
tron transfer reactions (6), Marcus theory is 
useful for understanding and controlling the 
rates of nonradiative MC-state relaxation as a 
function of the MC-state energy. 

Chan et al. noted that as the MC- 
state energy increases between iron(II) 
tris(2-pyridylmethyliminoethyl)amine 
{[Fe(tren(py),)]’*} and iron(II) tris(2,2’-bi- . 
pyridine) {[Fe(bpy),]’*}, the energy needed ‘ 


Compounds with more-energetic and long-lived excited states can enhance photocatalysis 
The excited states of iron(II) and cobalt(IIl) compounds (blue and green, respectively) relax to their ground states (gray) (left). The relaxation rates increase with excited 
state energy in the “normal” regime (iron compounds) but decrease with excited state energy in the inverted region (cobalt compounds) (right). 
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to cross the activation barrier (AG) for the 
excited MC state to relax into the ground 
state decreases, thereby shortening the MC 
lifetime from 55 ns to roughly 1 ns (see the 
figure). When the MC-state energy increases 
further in a family of cobalt(III tris(2,2’-bi- 
pyridine) {[Co(bpy*),]**} compounds, the MC 
lifetime elongates to 6 ns in the best case, be- 
cause of the build-up of a new activation bar- 
rier that must be overcome so that the MC 
excited state can relax to the ground state. 
The rate of the MC-state relaxation (the in- 
verse of the MC-state lifetime) as a function 
of the MC-state energy, follows a bell-shaped 
curve, analogously to the energy dependence 
of electron transfer rates in Marcus theory (6, 
7). In the “normal” regime, which is encoun- 
tered by iron(II) compounds, relaxation rates 
increase with increasing excited state energy, 
whereas in the “inverted” zone, which is en- 
countered by cobalt(III) compounds, relax- 
ation decay rates decrease with increasing 
energy. 

The excited states of cobalt(III) com- 
pounds are very strong oxidants, with po- 
tentials approaching 1.7 V compared with 
those of saturated calomel electrodes (SCEs), 
which is roughly 0.5 V more oxidizing than 
some of the benchmark iridium(II) photo- 
oxidants that are widely used in synthetic 
photochemistry (8). This makes them at- 
tractive for photocatalysis and enables new 
photoreactivity, including, for example, the 
oxidative coupling of aryl amides and aryl 
boronic acids, as demonstrated by Chan et 
al. This reaction type, which is of interest, 
for example, for the synthesis of medicinal 
agents, had previously proved very difficult 
to carry on with some reactants. 

The work reported by Chan et al. is concep- 
tually groundbreaking, yet some challenges 
remain for future research. The amount of 
visible light absorbed by the cobalt(III) com- 
pounds is low, and a substantial portion of 
the excitation energy is lost between light 
absorption and photocatalysis. It seems evi- 
dent that after decades of research focused 
on iron(II), its neighboring metal elements 
in the periodic table with identical electron 
configurations might deserve more attention 


(9). 
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IMMUNOTHERAPY 


Engineered bacteria guide 


T cells to tumors 


T cells and bacteria are engineered to work together 


to find and destroy tumor cells 


By Eric M. Bressler and Wilson W. Wong 


ince the US Food and_ Drug 

Administration (FDA) approved 

the first chimeric antigen recep- 

tor (CAR)-T cell product for acute 

lymphoblastic leukemia (ALL), the 

promise of similar success against 
nonhematologic cancers has fueled CAR-T 
cell development. CAR-T cells are a pa- 
tient’s own T cells engineered to express 
a synthetic CAR that enables native T cell 
signaling upon recognition of a chosen 
target antigen (J). Despite five additional 
CAR-T cell approvals for hematologic ma- 
lignancies, efficacy against solid tumors 
remains elusive. On page 211 of this issue, 
Vincent et al. (2) report engineering bacte- 
ria to infiltrate solid tumors and produce 
antigens that selectively attach to extracel- 
lular matrix (ECM) proteins in the tumor 
microenvironment (TME); they also re- 
lease a chemokine. The antigen and che- 
mokine payloads recruit CAR-T cells into 
the tumors, resulting in reduction in tu- 
mor volume in mouse models of leukemia, 
colorectal cancer, and breast cancer. The 


platform promises an antigen-agnostic tar- 
geting strategy with substantial customiza- 
tion potential. 

Hematologic malignancies lend them- 
selves to CAR-T cell therapy because tumor 
cells express ubiquitous targets (e.g., CD19); 
associated on-target, off-tumor toxicity is 
amenable to clinical intervention; and T 
cells efficiently traffic to tumors and malig- © 
nant cells in circulation. However, solid tu- 
mors express heterogeneous and nonspecific 
antigens, and they are poorly infiltrated by 
T cells. Thus, on-target, off-tumor toxicity, 
wherein CAR-T cells attack the targeted anti- 
gen on healthy tissue, lead to potentially fatal 
effects in patients treated with CAR-T cells 
(3). Furthermore, heterogeneous expression 
of antigens tempers overall efficacy and en- 
ables continued proliferation of cancer cells 
that do not express the targeted antigen (4). 
Dense stroma and poor vascularization fur- 
ther inhibit T cell entry into solid tumors (J). 

Strategies to overcome these challenges 
include incorporating genetic circuitry 
(e.g., logic-gated CAR-T cells) (5), switch- 
able adapter proteins (6, 7), and targeted 
biomaterial enhancements (8). Vincent et 


Combining two immunotherapies 


The probiotic-guided chimeric antigen receptor (CAR)-T cell (ProCAR) platform enables antigen tagging of tumors, 
adjuvant immune stimulation, and enhanced T cell trafficking to tumors. Engineered attenuated Escherichia coli 
Nissle 1917 release Pro" proteins into the extracellular space during intratumoral colonization. Pro'8 comprises 

a heparin binding domain to bind the extracellular matrix (ECM), and thereby “tag” tumors, and green fluorescent 
protein (GFP), which CAR-T cells are engineered to recognize. The E. coli Nissle 1917 could also be programmed to 
produce chemokines to drive CAR-T cell infiltration into solid tumors and inflammation. 


E. coli 
Nissle 1917 


Tumor cell 


' ECM “ 
Heparin 4 ‘A 
binding  ° C ‘ 


domain ° 


Inflammatory 
immune cell 


CAR-T cell 


Probiotic-colonized tumor 


science.org SCIENCE 


al.’s probiotic-guided CAR-T cell (ProCAR) 
platform showcases the utility of engi- 
neered bacteria as a new enhancement to 
CAR-T cell therapy. Probiotic therapy for 
cancer involves intravenous or oral admin- 
istration of attenuated bacterial strains to 
enable tumor colonization and stimulate 
immune system activity against tumors. 
Clinical trials have demonstrated safety, 
though not efficacy, for patients with sub- 
stantial solid tumor burden (9, JO). 

Combining probiotic therapy with CAR-T 
cell therapy, Vincent et al. used Escherichia 
coli Nissle 1917 engineered to release an im- 
munogenic payload throughout the tumor 
upon colonizing the TME (see the figure). 
Therapeutic bacterial tumor colonization 
traditionally occurs through the natural 
proclivity of engineered bacteria for the im- 
mune-privileged necrotic tumor core after 
intravenous or intratumoral injection, and 
attenuation strategies such as chromosomal 
deletion and lipopolysaccharide (LPS) modi- 
fication enable greater than 10,000-fold 
tumor accumulation without significant 
growth in healthy tissue (11). The released 
“Pro®s” antigen comprises green fluorescent 
protein (GFP) linked to the heparin binding 
domain (HBD) of placenta growth factor 2 
(PIGF-2), which enables efficient binding to 
tumor ECM through HBD and bio-orthog- 
onal antigen recognition by CAR-T cells 
through GFP. Thus, tumors colonized with 
the bacteria will be coated in Pro™?, making 
them vulnerable targets to circulating CAR-T 
cells targeting GFP. 

Vincent et al. tested the ProCAR platform 
in both humanized and immunocompetent 
mouse models of leukemia, colorectal can- 
cer, and breast cancer. ProCAR reduced tu- 
mor volume compared to mock treatment 
and treatment with probiotics that express 
nonfunctional Pro”? in all models. They 
demonstrated enhanced survival in a model 
of aggressive leukemia. Moreover, T cell tu- 
mor infiltration and tumor clearance were 
enhanced in the ProCombo system in which 
the bacteria also released an activating mu- 
tant chemokine, C-X-C motif chemokine 16 
(CXCL16***4), In particular, the success of 
ProCombo suggests the potential for greater 
engineering complexity of the probiotic, 
which could enable optimization and modi- 
fication without increasing the complexity of 
treatment administration. 

Additionally, Vincent et al. demonstrate 
key safety features of the ProCAR system in 
mice. Pro? binding was specific for CAR-T 
cells, so immune cell targeting and atten- 
dant treatment failure are unlikely to oc- 
cur. Furthermore, both bacteria and Pro™® 
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localized to the tumor, rather than healthy 
tissue. Thus, both bacterial blood growth 
(bacteremia) and on-target, off-tumor tox- 
icity may be avoidable with careful probi- 
otic attenuation and protocol optimization. 
Studies demonstrating T cell response to 
bacterial adjuvants in vitro and in vivo sug- 
gest a high potential for this strategy in im- 
munologically “cold” tumors, which escape 
immune detection through low expression 
density of neoantigens. 

Translation of the ProCAR system to the 
clinic will depend on scalability to larger 
tumors and attenuation of bacterial strains 
for safety. Vincent et al. note the possibil- 
ity of tumor regions that could be untagged 
or may not be colonized by bacteria; such 
areas could be substantial in bulky tumors. 
Human tumors that are 2 cm in diameter, a 
lower threshold for the size of advanced tu- 
mors of a variety of cancer types in humans, 
are 20- to 40-fold larger than the mouse tu- 
mors in this study (72, 13). Future studies will 
require an investigation of the diffusion dis- 
tance of Pro™? through progressively larger 
tumors. Pro™? production must be tailored 
to balance the intratumoral bacterial load, 
subsequent immune response, and Pro 
density within the tumor. Additionally, hu- 
mans mount a far more robust immune re- 
sponse against bacteria and the endotoxins 
that they produce than do mice. A phase I 
study of intravenous administration of at- 
tenuated Salmonella typhimurium remains 
the strongest evidence that tumor coloniza- 
tion with intravenous probiotics is safely 
achievable. However, dose-limiting bacte- 
remia and other toxicities were observed 
at high doses (8). Thus, there is precedent 
to indicate that ProCAR could be safely 
achieved in humans with bacterial attenu- 
ation and efficacy at low bacterial dosage. 
The study of Vincent e¢ al. is an important 
proof-of-concept for a potential approach 
to treating heterogeneous, immunologically 
cold, and poorly infiltrated solid tumors. 
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Asignaling lipid 
drives synapse 
formation 


Phosphatidylinositol 
3,5-bisphosphate enables 
transport of proteins to 
synaptic sites 


By Pilar Rivero-Rios and Lois S. Weisman 


rain function relies on the accurate 
formation of synaptic contacts, and 
alterations in this process are im- 
plicated in disease, including some 
psychiatric and neurodevelopmen- 

tal disorders. Synapse formation re- 
quires the synthesis and transport of pre- 
synaptic proteins from the soma (which 
contains the nucleus) to the axon terminal. 
The identity and regulation of the transport 
organelle or organelles that are responsible 
for the assembly of presynaptic sites is not 
yet clear. On page 223 of this issue, Rizalar 
et al. (1) define the ultrastructure of these 
organelles and show that they are precur- 
sor vesicles that transport several types 
of presynaptic proteins at once. They also 
show that the signaling lipid phosphati- 
dylinositol 3,5-bisphosphate [PI(3,5)P,] plays 
a key role in their transport along the axon 
by binding to kinesin family member 1A 
(KIFIA), which is a microtubule-dependent 
motor protein. Further investigation may aid 
the development of therapies for disorders 
that are characterized by defective synapses. 
During neuronal development, presynap- 
tic proteins such as synaptic vesicle proteins, 
active zone proteins (which provide the site 
for docking and release of synaptic vesicles), 
P/Q-type calcium channels (which are in- 
volved in presynaptic vesicle release) and ad- 
hesion molecules (for example, neurexin-18), 
are produced in the soma and transported 
along the axon in precursor vesicles (PVs) 
to nascent presynaptic sites. Two models for 
this transport have been proposed. In one, 
multiple types of vesicles act as carriers, each 
transporting a different class of synaptic 
proteins. In the other model, multiple types 
of synaptic proteins are transported by the 
same carrier or cluster of carriers. To distin- 
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al.’s probiotic-guided CAR-T cell (ProCAR) 
platform showcases the utility of engi- 
neered bacteria as a new enhancement to 
CAR-T cell therapy. Probiotic therapy for 
cancer involves intravenous or oral admin- 
istration of attenuated bacterial strains to 
enable tumor colonization and stimulate 
immune system activity against tumors. 
Clinical trials have demonstrated safety, 
though not efficacy, for patients with sub- 
stantial solid tumor burden (9, JO). 

Combining probiotic therapy with CAR-T 
cell therapy, Vincent et al. used Escherichia 
coli Nissle 1917 engineered to release an im- 
munogenic payload throughout the tumor 
upon colonizing the TME (see the figure). 
Therapeutic bacterial tumor colonization 
traditionally occurs through the natural 
proclivity of engineered bacteria for the im- 
mune-privileged necrotic tumor core after 
intravenous or intratumoral injection, and 
attenuation strategies such as chromosomal 
deletion and lipopolysaccharide (LPS) modi- 
fication enable greater than 10,000-fold 
tumor accumulation without significant 
growth in healthy tissue (11). The released 
“Pro®s” antigen comprises green fluorescent 
protein (GFP) linked to the heparin binding 
domain (HBD) of placenta growth factor 2 
(PIGF-2), which enables efficient binding to 
tumor ECM through HBD and bio-orthog- 
onal antigen recognition by CAR-T cells 
through GFP. Thus, tumors colonized with 
the bacteria will be coated in Pro™?, making 
them vulnerable targets to circulating CAR-T 
cells targeting GFP. 

Vincent et al. tested the ProCAR platform 
in both humanized and immunocompetent 
mouse models of leukemia, colorectal can- 
cer, and breast cancer. ProCAR reduced tu- 
mor volume compared to mock treatment 
and treatment with probiotics that express 
nonfunctional Pro”? in all models. They 
demonstrated enhanced survival in a model 
of aggressive leukemia. Moreover, T cell tu- 
mor infiltration and tumor clearance were 
enhanced in the ProCombo system in which 
the bacteria also released an activating mu- 
tant chemokine, C-X-C motif chemokine 16 
(CXCL16***4), In particular, the success of 
ProCombo suggests the potential for greater 
engineering complexity of the probiotic, 
which could enable optimization and modi- 
fication without increasing the complexity of 
treatment administration. 

Additionally, Vincent et al. demonstrate 
key safety features of the ProCAR system in 
mice. Pro? binding was specific for CAR-T 
cells, so immune cell targeting and atten- 
dant treatment failure are unlikely to oc- 
cur. Furthermore, both bacteria and Pro™® 
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localized to the tumor, rather than healthy 
tissue. Thus, both bacterial blood growth 
(bacteremia) and on-target, off-tumor tox- 
icity may be avoidable with careful probi- 
otic attenuation and protocol optimization. 
Studies demonstrating T cell response to 
bacterial adjuvants in vitro and in vivo sug- 
gest a high potential for this strategy in im- 
munologically “cold” tumors, which escape 
immune detection through low expression 
density of neoantigens. 

Translation of the ProCAR system to the 
clinic will depend on scalability to larger 
tumors and attenuation of bacterial strains 
for safety. Vincent et al. note the possibil- 
ity of tumor regions that could be untagged 
or may not be colonized by bacteria; such 
areas could be substantial in bulky tumors. 
Human tumors that are 2 cm in diameter, a 
lower threshold for the size of advanced tu- 
mors of a variety of cancer types in humans, 
are 20- to 40-fold larger than the mouse tu- 
mors in this study (72, 13). Future studies will 
require an investigation of the diffusion dis- 
tance of Pro™? through progressively larger 
tumors. Pro™? production must be tailored 
to balance the intratumoral bacterial load, 
subsequent immune response, and Pro 
density within the tumor. Additionally, hu- 
mans mount a far more robust immune re- 
sponse against bacteria and the endotoxins 
that they produce than do mice. A phase I 
study of intravenous administration of at- 
tenuated Salmonella typhimurium remains 
the strongest evidence that tumor coloniza- 
tion with intravenous probiotics is safely 
achievable. However, dose-limiting bacte- 
remia and other toxicities were observed 
at high doses (8). Thus, there is precedent 
to indicate that ProCAR could be safely 
achieved in humans with bacterial attenu- 
ation and efficacy at low bacterial dosage. 
The study of Vincent e¢ al. is an important 
proof-of-concept for a potential approach 
to treating heterogeneous, immunologically 
cold, and poorly infiltrated solid tumors. 
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A signaling lipid 
drives synapse 
formation 


Phosphatidylinositol 
3,5-bisphosphate enables 
transport of proteins to 
synaptic sites 


CELL BIOLOGY 


By Pilar Rivero-Rios and Lois S. Weisman 


rain function relies on the accurate 
formation of synaptic contacts, and 
alterations in this process are im- 
plicated in disease, including some 
psychiatric and neurodevelopmen- 
tal disorders. Synapse formation re- 
quires the synthesis and transport of pre- ° 
synaptic proteins from the soma (which 
contains the nucleus) to the axon terminal. 
The identity and regulation of the transport 
organelle or organelles that are responsible - 
for the assembly of presynaptic sites is not 
yet clear. On page 223 of this issue, Rizalar 
et al. (1) define the ultrastructure of these 
organelles and show that they are precur- 
sor vesicles that transport several types 
of presynaptic proteins at once. They also 
show that the signaling lipid phosphati- 
dylinositol 3,5-bisphosphate [PI(3,5)P,] plays 
a key role in their transport along the axon 
by binding to kinesin family member 1A . 
(KIFIA), which is a microtubule-dependent ° 
motor protein. Further investigation may aid 
the development of therapies for disorders 
that are characterized by defective synapses. 
During neuronal development, presynap- 
tic proteins such as synaptic vesicle proteins, 
active zone proteins (which provide the site 
for docking and release of synaptic vesicles), 
P/Q-type calcium channels (which are in- 
volved in presynaptic vesicle release) and ad- 
hesion molecules (for example, neurexin-18), 
are produced in the soma and transported 
along the axon in precursor vesicles (PVs) 
to nascent presynaptic sites. Two models for 
this transport have been proposed. In one, 
multiple types of vesicles act as carriers, each 
transporting a different class of synaptic 
proteins. In the other model, multiple types 
of synaptic proteins are transported by the 
same carrier or cluster of carriers. To distin- 
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guish between the two options, Rizalar et al. 
studied glutamatergic neurons (iNs) derived 
from human induced pluripotent stem cells 
in vitro. These cells expressed a fluorescently 
tagged version of the synaptic vesicle pro- 
tein synaptophysin. 

Consistent with the cotransport model, 
Rizalar et al. found that synaptophysin was 
transported in puncta that move antero- 
gradely (from soma to axon terminal) and 
which also contained other synaptic vesicle 
proteins, active zone pro- 
teins, and neurexin-1f. It is 
not yet clear whether these 
puncta represent single ves- 
icles or clusters of vesicles. 

The PVs did not con- 
tain components of the 
secretory pathway, re- 
cycling endosomes, or 
mitochondria but dis- 
played partial cotrans- 
port with the early 
endosomal marker 
ras-related protein 
RAB5A, and around 
half contained lysosomal 
markers. Rizalar et al. 
show that the PVs are dis- 
tinct from mature _ lyso- 
somes because they are not 
acidic and do not have deg- _ 
radative activity. Thus, as - 
previously reported (2), PVs 
represent a distinct organ- 
elle that may derive from 
a pathway that also sorts 
lysosomal proteins. Future studies using 
long-term live-cell imaging may reveal the 
origin of this pathway, which could involve 
endosomal compartments and/or trans- 
Golgi network-derived vesicles. 

Rizalar et al. used acute chemical dimer- 
ization to couple PVs to the outer membrane 
of mitochondria, which are much larger and 
morphologically distinct organelles, and an- 
alyzed them with focused ion beam milling 
scanning electron microscopy (FIB-SEM) 
to visualize the PV ultrastructure. The PVs 
were vesicular and tubular structures mea- 
suring 67 to 220 nm. This finding aids the 
interpretation of previous work in mouse 
saphenous nerves and rat hippocampal neu- 
rons that proposed a similar morphology of 
transport carriers (3, 4). The discovery also 
further supports the conclusion that PVs 
are distinct from other organelles. A small 
fraction of vesicles contained electron-dense 
material or intralumenal vesicles, suggest- 
ing that they originate, at least in part, from 
the degradative pathway. 

KIF1A is a motor protein that interacts 
with ADP-ribosylation factor-like protein 
8A (ARL8A) and ARL8B, which are small 
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guanosine triphosphatases (GTPases) whose 
depletion causes the accumulation of synap- 
tic vesicle and active zone proteins in proxi- 
mal axons (5-7). Consistent with this, Riza- 
lar et al. found that loss of KIFIA or ARL8A 
and ARLS8B in iNs resulted in a decrease in 
the amounts of synaptic vesicles or active 
zone proteins at the presynaptic site. This 
indicates that KIFIA is the main motor pro- 
tein that promotes the anterograde trans- 
port of PVs along the axon (8-10). However, 


Transport of presynaptic proteins 
Presynaptic proteins are generated in the soma and transported along the axon in al. 
precursor vesicles. Proteins cotransported in the same type of precursor vesicle include 
SV proteins, AZ proteins, and the adhesion molecule NRXN1B, but not presynaptic Ca** 
channels. PIKFYVE-mediated synthesis of PI(3,5)P,, regulates the anterograde transport 
of precursor vesicles by promoting the recruitment of the kinesin motor KIF1A. 
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the amount of presynaptic P/Q-type calcium 
channels was not affected, suggesting that 
other mechanisms are involved in the axo- 
nal transport of these channels. 

Rizalar et al. also found that the signal- 
ing lipid PI(3,5)P, is necessary for PV trans- 
port. PI(3,5)P, is a phosphoinositide lipid 
that is generated by 1-phosphatidylinositol 
3-phosphate 5-kinase (PIKFYVE), a lipid 
kinase implicated in the function of many 
types of organelles (JJ). Importantly, PIK- 
FYVE is critical for brain function. Indeed, 
defects in PI(3,5)P, synthesis are related to 
some inherited, childhood onset neurode- 
generative diseases, and reduced PIKFYVE 
activity affects synaptic function in cul- 
tured neurons (11-13). 

KIFIA has a phosphoinositide-binding 
pleckstring-homology (PH) domain that 
binds other phosphoinositide lipids in vi- 
tro (14), and Rizalar et al. found that the 
PH domain of KIFIA specifically associ- 
ates with PI(3,5)P, on liposomes. Notably, 
in previous work, a point mutation in the 
PH domain of the Caenorhabditis elegans 
ortholog of KIFIA (UNC104) decreased its 
localization to presynaptic vesicles (15). 
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ARL8, ADP-ribosylation factor-like protein 8; AZ, active zone; KIF1A, kinesin family member 1A; 
NRXN1B, neurexin-1B; PI(3,5)P,,, phosphatidylinositol 3,5-bisphosphate 
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Similarly, in the study by Rizalar et al., 
inhibition of P1(3,5)P, synthesis or genetic 
depletion of PIKFYVE resulted in the dis- 
sociation of KIFIA from motile PVs in iNs 
and a decrease in the proper localization of 
active zone and synaptic vesicle proteins. 
These data indicate that P1I(3,5)P, recruits 
KIFI1A to PVs to promote their anterograde 
axonal transport (see the figure). Given the 
multiple roles of PI(3,5)P, in the endolyso- 
somal system (JJ), it may also contribute in 
other ways to synaptic pro- 
tein synthesis and delivery. 

The study by Rizalar et 
offers key molecular 


insights into presynap- 
tic biogenesis. However, 
many questions remain. 


For example, it is not yet 
clear what the source path- 


or whether PVs represent 
a homogeneous’ popula- 
tion. Furthermore, whether 
there are additional pro- 
teins that regulate the syn- 
protein thesis and transport of PVs 
SV and whether the mecha- 
nisms governing presynap- 
tic formation vary among 
different neuron’ types 
are not known. Other out- 
standing questions include 
whether the same mecha- 
nisms are involved in pre- 
synaptic plasticity and how 
calcium channels are trans- 
ported to presynaptic sites. Answering 
these questions will be critical to dissect 
the mechanisms of presynaptic formation 
and how failures in these pathways con- 
tribute to neurological disorders. 
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Variant-adapted COVID-19 booster vaccines 


Current vaccines should be tailored to combat future SARS-CoV-2 variants 


By Florian Krammer?3 and Ali H. Ellebedy*° 


evere acute respiratory syndrome 
coronavirus-2 (SARS-CoV-2) vaccines 
greatly reduced the burden of the 
COVID-19 pandemic and saved mil- 
lions of lives (J). However, waning 
levels of circulating neutralizing an- 
tibodies and the continuous emergence of 
SARS-CoV-2 variants of concern decreased 
vaccine effectiveness and prompted the need 
for booster immunizations. Some variants 
are substantially antigenically distant from 
the ancestral strain, propelling the devel- 
opment of variant-adapted vaccines. The 
entanglement of waning immunity and vi- 
ral variants in terms of immune protection, 
variant selection in updated vaccines, ef- 
fectiveness of these vaccines in inducing de 
novo responses to new variants, and when to 
change them in the future remain outstand- 
ing knowledge gaps. 

The primary series of SARS-CoV-2 vac- 
cines were rolled out at the end of 2020 and 
beginning of 2021. These immunizations 
elicited high levels of spike-specific anti- 
bodies (which recognize the spike protein 
expressed on the surface of SARS-CoV-2) 
that dropped considerably (by 10- to 15-fold) 
over the first few months after vaccination 
(2). This decline coincided with the emer- 
gence and spread of the Delta variant in the 
Northern Hemisphere spring and summer of 
2021. Although the decay of vaccine-induced 
antibody responses raised some concerns, it 
can be explained by the dynamics of B cell 
responses to vaccination: The initial wave 
of antibodies is produced by short-lived 
plasmablasts, which are a terminally dif- 
ferentiated subset of responding B cells (3). 
Antibodies produced by this initial response 
in the serum of vaccinated individuals have a 
half-life of weeks; thus, circulating antibody 
titers decline slowly in the few months after 
vaccination. However, serum spike antibody 
levels started to plateau 6 to 9 months after 
vaccination (2), indicating the induction of 
long-lived bone marrow plasma cells. This 
subset can potentially persist for the lifetime 
of the host. Neutralizing serum antibody 
titers are a strong correlate of protection 
from symptomatic SARS-CoV-2 infection (4). 


Waning neutralizing antibody levels result in 
a higher probability of breakthrough infec- 
tions, especially with new variants. When 
booster doses were rolled out, serum anti- 
body levels increased again, and longer-term 
antibody titers plateaued at a higher baseline 
compared to preboosting levels. Therefore, 
the third vaccine dose may be important for 
long-term protection and could be consid- 
ered part of the primary vaccination regimen 
in, for example, children. 

The emergence of variants of concern 
complicated the relatively simple relation- 
ship between attaining protection after im- 
munization with vaccines based on the an- 
cestral SARS-CoV-2 strain (5). It is critical to 
disentangle the issue of waning immunity 
from immune escape by emerging variants. 
Initially, the Delta variant caused widespread 
breakthrough infections, but it did not have 
a strong direct immune escape phenotype 
compared to the contemporaneous Beta vari- 
ant. But Delta replicated efficiently in the 
upper respiratory tract and had a shorter 
incubation period between infection and 
symptom onset (6, 7). These qualities are 
indirect immune escape mechanisms: More 
replication in the upper respiratory tract 
means more viral shedding, which, in turn, 
means that individuals exposed to infected 
people will likely inhale higher viral loads. 
This increased load can overwhelm preexist- 
ing immunity, leading to a higher chance of 
breakthrough infections. The shorter incuba- 
tion period limits the amount of time to re- 
call immune responses before the infection 
becomes symptomatic. 

When the first Omicron spike sequences 
emerged in the Northern Hemisphere au- 
tumn of 2021, it became clear that this variant 
would have extensive direct immune escape 
because many of the antigenic sites targeted 
by neutralizing antibodies had changed. 
Moreover, Omicron and its subvariants also 
have a short incubation time, even shorter 
than that of the Delta variants, making a re- 
call (anamnestic) immune response even less 
effective in protecting from symptomatic in- 
fection (7). As a result of the substantial drop 
in neutralizing serum antibody titers induced 
by ancestral vaccines when tested against 
the Omicron variants, updated vaccines 


containing spikes from the ancestral and 
Omicron BA.5 variants were licensed by the 
US Food and Drug Administration (FDA) in 
the autumn of 2022. Similarly, the European 
Medicines Agency (EMA) approved the first 
bivalent vaccines that contained the ances- 
tral and BA.1 spike, followed by licensure of 
another bivalent vaccine containing the an- 
cestral and BA.5 spikes. 

Clinical trials comparing ancestral spike 
vaccines with bivalent ancestral-BA.1 spike 
(Moderna and Pfizer) or monovalent BA.1 
spike (Pfizer) showed that these updated ~ 
vaccines induced slightly better neutraliz- 
ing antibody titers against BA.1 compared 
to those induced by the ancestral vaccines. 
Notably, the monovalent BA.1 vaccine per- 
formed better than the bivalent vaccine in 
a direct comparison (8). After the rollout of 
the bivalent ancestral-BA.5 vaccines—which 
had not been tested in clinical trials before 
licensure—the main question was whether 
these vaccines would induce higher-neutral- 
izing antibody titers against BA.5 and newer 
Omicron subvariants compared to mon- 
ovalent ancestral booster vaccines. Several 
studies showed that the titers induced by 
the updated vaccines were only marginally 
higher, whereas other studies demonstrated 
a substantial difference (9-12). 

Many of these comparisons, however, 
have been complicated by the timing of 
serum sample collection from vaccinated 
individuals, with samples from individu- 
als who received the monovalent ances- 
tral booster collected in the first half of 
2022 and those who received the bivalent 
booster samples collected later in autumn 
2022. Bivalent booster recipients may have 
experienced breakthrough infections be- 
tween these two time points, which may 
bias comparisons. The vaccine rollout in 
Europe, however, allowed direct compari- 
son of antibody titers induced by the bi- 
valent BA.1 and bivalent BA.5 boosters. A 
study showed that the bivalent BA.5 vac- 
cine slightly skewed the response to BA.5, 
whereas the bivalent BA.1 vaccine did not 
skew to BA.1, potentially owing to inher- 
ently lower immunogenicity (J3). This re- 
sult highlights possible inherent immuno- 
genicity differences between spikes, which 
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should be considered when selecting new 
vaccine strains. 

A key question is whether vaccination with 
an updated variant would induce de novo 
responses to that strain or if only recall re- 
sponses against shared epitopes would be 
mounted. It has been observed, for example, 
with influenza virus, that the first exposures 
to antigen leave an immunological imprint— 
immune memory—that then skews responses 
to new but related strains toward epitopes 
that are shared between the initial strain and 
the new strain, with little de novo response 
specific for the new strain. For COVID-19 vac- 
cines, serum antibody titers induced by bi- 
valent vaccines consist of antibodies specific 
to the ancestral spike and antibodies that 
are cross-reactive between both spikes (J4). 
However, BA.5-specific reactivity in serum 
samples of recipients of bivalent BA.5 vac- 
cines was not detected (J4). This suggests that 
the initial response to bivalent BA.5 vaccina- 
tion is mostly a recall response focusing on 
shared epitopes (see the figure). Consistently, 
only 6 of 378 memory B cell-derived spike 
monoclonal antibodies (mAbs) that were 
isolated from individuals boosted with a 
monovalent BA.1 vaccine recognized the BA.1 
spike protein, but not the ancestral one (J5). 
All remaining mAbs identified both spike 
proteins. These data suggest that, although 


BA.1 or BA.5 spike sequence-specific reactiv- 
ity is not detectible in serum after booster 
vaccination, the immune system is capable of 
inducing BA.1- and BA.5-specific B cells that 
become part of the memory B cell pool. 

These findings are not necessarily bad 
news. Cross-reactive B cells may express neu- 
tralizing antibodies that already bind with 
high affinity to the new variant. Some may 
bind with low affinity but can undergo ad- 
ditional rounds of affinity maturation to en- 
hance binding to the spike antigen. It makes 
sense that biochemical solutions for anti- 
body-antigen interactions are reoptimized 
for an antigen that has changed, instead of 
finding new binding solutions. Notably, a po- 
tently neutralizing cross-reactive antibody is 
as good as a de novo Strain-specific neutral- 
izing antibody with the same potency if both 
are present at the same titer. Thus, cross-reac- 
tive neutralizing antibodies are likely to be as 
protective as strain-specific ones. In addition, 
the de novo response detected in the memory 
compartment, even if occurring at a very low 
frequency, could be expanded by an encoun- 
ter with a variant virus related to Omicron. 
Overall, given the efficiency of engaging 
cross-reactive memory B cells by Omicron- 
derived spike antigens, the continued inclu- 
sion of the ancestral strain in booster immu- 
nizations is highly questionable. 


In the meantime, SARS-CoV-2 has not 
stopped evolving. The last dominant BA.5 
descendant, BQ.1.1, has been outcompeted 
by recombinant BA.2 descendants of the XBB 
lineage, including EG.5, which is increasing 
in frequency and shows a stronger immune 
escape phenotype. A new BA.2-derived lin- 
eage, BA.2.86, has also recently been detected 
and is characterized by a large number of 
amino acid changes. The FDA, EMA, and 
World Health Organization (WHO) have rec- 
ommended that COVID-19 vaccines should be 
updated again, for autumn 2023, and annual 
booster doses containing the newest variants 
may be recommended every year, similar to 
influenza virus vaccines. Which strain should 
be included? It is not useful to include the 
ancestral strain in updated booster doses in 
the autumn of 2023, and thus the reeommen- 
dations of FDA, EMA, and WHO for updating 
to an XBB-based monovalent vaccine with a ~ 
preference for XBB.1.5 make sense. 

It is unclear whether annual SARS-CoV-2 
vaccine updates are the best solution mov- 
ing forward. It is also unknown if the mRNA 
vaccine platform is best suited for updated 
boosters or whether alternative platforms 
(such as recombinant protein-based vaccines 
or new technologies) or heterologous boost- 
ing will be more effective and better accepted 
by the population. Should an updated vac- 


Dynamics of B cell and antibody responses to COVID-19 vaccines 
Immunization based on the primary series of ancestral (WA1/2020 spike protein) severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vaccines followed by 
boosting with a variant-based vaccine (BA.1 spike) initially elicits a subset of spike-specific B cells, which rapidly proliferate and differentiate into antibody-secreting, 
short-lived plasmablasts. Some activated B cells form a germinal center reaction where responding B cell clones undergo iterative rounds of somatic hypermutation and 
affinity maturation of their B cell receptor against the inciting antigen that is presented by follicular dendritic cells (FDCs). Graduates of the germinal center differentiate 
into long-lived bone marrow plasma cells (BMPCs) and circulating memory B cells (MBCs). Cells that develop to recognize the ancestral spike protein may also cross- 
react to the new variant spike (red), whereas other cells may be variant-specific (blue and orange). 
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cine be administered once or twice? Vaccine 
equity is an important issue that emerged 
during the pandemic, and it remains unre- 
solved how annual boosters, if needed, can 
be made available to the global population. 
Moreover, should the vaccine be offered to 
everyone or only to highly vulnerable popu- 
lations? And which regimen would be ideal 
for SARS-CoV-2-naive children? Practically, 
annual booster doses are easier to implement 
than ad hoc boosting whenever a new vari- 
ant emerges. However, for healthy individu- 
als, annual boosters may not be needed, and 
boosting only when a more virulent variant 
emerges may be more suitable. In this case, 
two doses given at a month’s interval may en- 
hance variant-specific responses. By contrast, 
for individuals with preexisting conditions 
that make them vulnerable to COVID-19, a 
regular annual booster will likely increase 
protection from severe disease regardless of 
the circulating variant. 

Although similar systems have been in 
use for influenza vaccines, the “influenza 
model” may not be an optimal solution 
in the long term for SARS-CoV-2. Indeed, 
a new generation of influenza virus vac- 
cines is being designed to give broader and 
longer-lasting protection independently 
of antigenic changes of the virus and to 
replace the current annual vaccinations. 
Similar broadly protective vaccines, poten- 
tially given mucosally (e.g., intranasally), 
are urgently needed for SARS-CoV-2. These 
vaccines will provide a much-needed ad- 
vantage in the ongoing struggle against 
this virus’s rapid evolution. 
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The numbers of gray whales 
(Eschrichtius robustus) 
are fluctuating as a result of 
environmental changes. 


The ecology of whales in 
a changing climate 


Some whale populations are exhibiting unexpected 


cycles of boom and bust 


By Andrew J. Read 


he study of whale ecology did not be- 

gin until most populations had been 

depleted by commercial whaling. 

Some species still teeter on the edge 

of extinction, whereas others have 

shown encouraging signs of recov- 
ery. The Eastern Pacific population of gray 
whales (Eschrichtius robustus) was hunted 
to low levels in the 19th century but was 
protected from commercial harvest by the 
International Whaling Commission in 1947. 
The population then grew to almost 27,000 
by 2016 (7), which is near the upper range 
of estimates of pre-whaling abundance (2). 
Most population models assume that after 
this recovery, gray whales would reach a 
relatively stable equilibrium. On page 207 
of this issue, Stewart et al. (3) challenge this 
assumption by documenting boom-and- 
bust cycles in abundance, driven by surpris- 
ingly large and rapid changes in their food 
supply over the past three decades. 
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Gray whales make one of the longest an- 
nual migrations of any mammal, from breed- 
ing grounds in Baja California, Mexico, to 
feeding grounds in the Bering and Chukchi 
seas, between Alaska and Russia. These 
whales feed by filtering crustacean prey us- 
ing plates of baleen suspended from their 
upper jaws. Their near-shore migratory 
route has allowed researchers to collect a 
rich trove of data on abundance, reproduc- 
tion, mortality, and body condition over the 
past 50 years; Stewart et al. used these da- 
tasets to construct a demographic model of 
the population. In their model outputs, car- 
rying capacity—the number of whales that 
can be supported by the ecosystem—dem- 
onstrated a large degree of annual varia- 
tion, caused by changes in prey availability 
and sea ice cover on the feeding grounds. 
In turn, these changes were driven by varia- 
tion in the sub-Arctic climate. Large reduc- 
tions in annual carrying capacity caused 
substantial mortality events in 1988, 1999, 
and 2019, which led to large decreases in 
gray whale abundance. 

When population dynamic models were 
first applied to whales in the past century, 
few scientists could have imagined the 
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magnitude and speed of changes in ocean 
environments caused by a rapidly warming 
climate. Whales were assumed to be largely 
buffered from environmental variation by 
their enormous size, mobility, and slow life 
histories. But from a variety of recent stud- 
ies, it is clear that climate change is affect- 
ing the ecology of these animals in ways 
that can exacerbate existing conservation 
problems or create new ones (4). 

The most immediate effects of a chang- 
ing climate on whale populations are modi- 
fications to their patterns of distribution, 
which are typically driven by prey availabil- 
ity. These are also the most straightforward 
effects to document. For example, over the 
past decade, many North Atlantic right 
whales (Eubalaena glacialis) shifted their 
primary summer feeding grounds from the 
Gulf of Maine to the Gulf of St. Lawrence, 
resulting in an increase in mortality from 
ship strikes and entanglement in fishing 
gear in Canadian waters (5). Similarly, the 
marine heat wave that occurred from 2014 
to 2016 off the US West Coast resulted in a 
shoreward shift in the distribution of hump- 
back whales (Megaptera novaeangliae) and 
an increase in the number of entanglements 
in the Dungeness crab fishery (6). 

Other populations of whales are ben- 
efiting from climate change, at least in the 
short term. For example, humpback whales 
feeding on krill along the western Antarctic 
Peninsula are taking advantage of longer 
feeding seasons that are due to reductions 
in sea ice, resulting in excellent body condi- 
tion (7) and high rates of fecundity (8). But 
what will happen when the sea ice, on which 
the krill ultimately depend, disappears? 

Stewart et al. describe two factors that 
may be responsible for the swings they 
observed in the Eastern Pacific gray whale 
population. The whales feed on amphipods, 
a low-trophic-level crustacean prey that is 
affected directly by environmental fluctua- 
tions. In addition, gray whales use a capital 
breeding strategy, in which they feed in- 
tensively during the summer and draw on 
stored energy reserves to fuel their long mi- 
grations and the costs of reproduction dur- 
ing the remainder of the year. Interannual 
variation in the duration of their feeding 
season, caused by the timing of sea ice for- 
mation and breakup, can thus affect their 
ability to store enough energy during the 
critical summer feeding period. 

How applicable are the lessons from gray 
whales for other species of baleen whales? 
Some species, such as humpback whales, 
have relatively broad diets that include 
both crustaceans and fish and are adept at 
prey-switching when environmental condi- 
tions change (9). This behavior may allow 
humpback whales to be more buffered than 


160 13 OCTOBER 2023 + VOL 382 ISSUE 6667 


gray whales from environmental variation. 
By contrast, blue whales (Balaenoptera 
musculus) feed almost entirely on krill and 
track their prey closely, even when its dis- 
tribution changes (10). If Stewart et al. are 
correct, then species with a narrow dietary 
niche composed of low-trophic-level prey 
species should be expected to exhibit con- 
siderable variation in their demography as 
environmental conditions change. As more 
baleen whales recover from overhunting in 
the coming decades, these hypotheses can 
be tested. 

Today, Eastern Pacific gray whales expe- 
rience very limited levels of human-caused 
mortality. Stewart et al. included mortal- 
ity from entanglements and ship strikes in 
their model and concluded that these fac- 
tors could not have caused the changes in 
abundance that they observed. But what if 
this population is subjected to higher levels 
of anthropogenic mortality in the future? 
The paradigm used to manage whales and 
other marine mammals in US waters relies 
on biological reference points to establish 
limits on the number of whales that can be 
removed from each population by human 
activities. These reference points, known 
as potential biological removal levels, were 
developed by using simulation models that 
assume that populations will remain at rel- 
atively constant carrying capacity (JJ). The 
boom-and-bust cycles observed by Stewart 
et al. do not fit this paradigm, and it is un- 
clear how human-caused mortality would 
be managed on top of such a large degree 
of climate-induced variation in abundance. 

During the past century, commercial 
whaling removed almost 3 million large 
whales from the world’s oceans (12). The 
era of unregulated hunting of whales 
is largely behind us, but the findings of 
Stewart et al. remind us that the recov- 
ery of these populations may not be as 
straightforward as expected in the era of a 
rapidly changing climate. 
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Chiral 
molecules to 
transmit 
electron spin 


Electron transfer through 
chiral molecules displays 
a strong spin preference 


By Joseph E. Subotnik 


nderstanding how electrons move 

through molecules and carry energy 

with them has been crucial for the 

development of multiple technologies, 

including photovoltaic cells and light- 

emitting diodes. The standard theory 
(1) establishes that electron transfer (ET) can 
be understood by considering energy conser- 
vation and the movement of electrons and 
nuclei. However, this view of ET has been 
challenged with the observation of chirality- 
induced spin selectivity (CISS), whereby elec- 
trons with one spin move differently through 
a material than do electrons of the other spin 
(2, 3). On page 197 of this issue, Eckvahl et al. 
(4) report a CISS signal for an isolated chi- 
ral molecular system in a liquid crystal en- 
vironment, after excitation with light. These 
results suggest that the standard ET theory 
should be modified to include both energy 
and total (orbital plus spin) angular momen- 
tum conservation, thereby opening the door 
to new CISS applications, including “green” 
hydrogen generation. 

The spin of an electron is an internal ro- 
tation of the elementary particle that was 
historically considered peripheral for ET. 
Nevertheless, over the last 24 years, the CISS 
effect (2, 3) has been demonstrated directly 
through a range of photoemission experi- 
ments (5) and indirectly through a variety of 
measurements, including spin-valve devices 
in which magnetoconductance (electrical 
conductivity change that results from an ap- 
plied magnetic field) was measured through 
chiral molecules (6) and magnetic-field de- 
pendent adsorption experiments. However, 
until now, the CISS effect has always been 
observed in the vicinity of a semiconduc- 
tor or metal, where there are many mobile 
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electrons and a well-defined direction along 
which spin—like any angular momentum— 
can be measured. Remarkably, Eckvahl ez al. 
demonstrate CISS in an environment with- 
out a solid or surface of any kind nearby. 

To demonstrate a CISS effect far from a 
surface, Eckvahl et al. had to overcome some 
inherent experimental difficulties. For CISS, 
all spins are polarized in the molecular frame 
(that is, relative to a molecular bond direc- 
tion), and molecules are oriented randomly 
in a disordered solution, which means that 
such spin signals cancel each other out in 
liquids. To that end, Eckvahl et al. employed 
a liquid crystal to achieve partial alignment 
of the molecules relative to a fixed known di- 
rection in space. Thereafter, as a probe, they 
used electron paramagnetic resonance (EPR), 
whereby a molecule with unpaired electrons 
is placed under a magnetic field and the re- 
sulting different spin alignments lead to a 
small splitting of spin energy levels in the 
microwave domain. The molecule is then ex- 
cited with microwaves, which results in mea- 
surable transitions between the occupied and 
unoccupied spin energy states. 

Eckvahl et al. studied molecules that com- 
prise an electron donor (D), a bridge (B), 
and an electron acceptor (A) (see the figure). 
Chirality is introduced by B, which can be 
either R (right-handed) or S (left-handed), 
depending on the direction in which B polar- 
izes light. Recognizing that the EPR spectra 
of molecules with R or S chirality can arise 
from a partial (rather than complete) spin 
preference for ET from D to A (7, 8), Eckvahl 
et al. have fit the EPR absorption curves with 
standard rate equations and estimate that, 
for this molecular experiment, the prob- 
ability for one electron with a specific spin 
to transfer from D to B to A is at least 40% 
greater than for the electrons with the other 
spin to transfer. 

The underlying physical mechanism for 
CISS remains unknown; it is even unclear 
whether a single mechanism or multiple 
mechanisms underly the range of CISS ob- 
servations. The leading source of confusion 
is that, for electrons in organic materials, the 
spin of an electron is coupled very weakly to 
the electron’s overall orbital motion—known 
as spin-orbit coupling, SOC—such that for 
an electron traveling through a large or- 
ganic molecule, there is simply no time for 
any meaningful change of spin (9). A leading 
hypothesis to explain CISS taking place on 
a surface even with a weak molecular SOC 
is that the CISS effect is caused by electron- 
electron interactions at the molecule-metal 
interface. According to that theory, the rel- 
evant SOC arises from the underlying inor- 
ganic metallic substrate (9-11). 

For the experiment of Eckvahl et al., there 
is no substrate from which to share SOC, and 
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Electron spin transfer depends on molecular chirality 

Only electrons with a certain spin are efficiently transferred when the donor-bridge-acceptor (D-B-A) molecule 
has the appropriate chirality (R, S). This is determined by electron paramagnetic resonance (EPR) spectra 
(shown here for illustration purposes). This selection is the chirality-induced spin selectivity (CISS) effect (top 
and middle). Achiral molecules do not affect electron spin transfer (bottom). 
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so the underlying CISS mechanism must be 
different. A proposed explanation (72) is that 
nonequilibrium nuclear motion (J3, I4) (i.e., 
phonons, a collective excitation of a network 
of atoms) may be responsible for impart- 
ing angular momentum to electronic spins. 
Nevertheless, this effect has never been di- 
rectly observed by, for example, changing the 
nuclear mass of atoms (by using different iso- 
topes). Interestingly, the EPR data in Eckvahl 
et al. imply that deuterating (replacing hy- 
drogen for deuterium, a “heavier” isotope) 
the D-B-A system leads to a decrease in the 
CISS signal. However, the authors attribute 
this change to the difference in the nuclear 
spins of hydrogen versus deuterium, and not 
to any nuclear vibrational effects. 

The field of spin-selective ET is ripe for 
future discovery. Eckvahl et al. have fash- 
ioned their study on the possibility of using 
CISS as a means of transmitting quantum 
information in the form of electronic spin (a 
candidate for a qubit, the basic unit of infor- 
mation in quantum computing), but equally 
important is the possibility of using the CISS 
effect for electrochemical and photochemi- 
cal purposes. The strongest means by which 
spins interact with each other is through 
exchange and the Pauli exclusion principle, 
which forbids two electrons with the same 
spin to be at the same point in space at the 


same time. Thus, it can be expected that 
CISS dynamics will play a role in multi- 
electron transfer processes, for example, in 
water-splitting experiments where multiple 
redox steps take place to generate “green” 
hydrogen fuels (15). Moreover, because the 
CISS effect can take place away from a sur- 
face, it may be useful for magnetic-field de- 
tection, perhaps explaining the physics of 
bird navigation (7). 
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SPACE GOVERNANCE 


One million (paper) satellites 


Radiofrequency filings warn of more congestion in space, but also offer possible remedies 


By Andrew Falle’, Ewan Wright?2, Aaron Boley’, 
Michael Byers‘ 


he occupation of Earth orbits by 
large constellations of satellites has 
received considerable attention in 
recent years. About 4500 Starlink 
and 630 OneWeb satellites are on 
orbit as of July 2023 (J), but this 
is only the beginning. Recent filings for 
radio spectrum with the International 
Telecommunication Union (ITU) suggest 
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that a dramatic increase in satellite num- 
bers is possible, much more than the tens 
of thousands often reported. Constellations 
much larger than SpaceX’s Starlink have 
been filed, including a 337,320-satellite 
constellation named Cinnamon-937 that 
was filed in September 2021. By treating 
orbital space as an unlimited resource, hu- 
manity is creating serious safety and long- 
term sustainability challenges to the use of 
low Earth orbit (LEO), including science 
conducted from space and the ground. 
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The ITU filings are the warning, and also 
part of the solution. There is urgent need 
for the ITU and its member states to adopt 
meaningful controls. 

Filing with the ITU occurs because satel- 
lites require access to the radiofrequency 
spectrum to communicate with ground 
stations and carry out mission operations, 
such as providing communication services 
or returning Earth observation data for 
analysis. The filings are made by national 
governments on behalf of private satel- 
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This image of the double star Albireo in Cygnus, 
taken on 26 December 2019, shows Starlink satellites 
moving across the field. 


lite companies and government agencies, 
under rules and procedures set out in the 
ITU’s Constitution, Convention, and Radio 
Regulations (2, 3). Different bands within 
the radiofrequency spectrum are used for 
different applications, such as fixed or mo- 
bile satellite services, and their associated 
filing types vary. Some operate on a first 
come, first served basis, whereas others 
leave satellite operators to coordinate among 
themselves. In all cases, the ITU requires na- 
tions to submit information on 
behalf of operators about their 
proposed satellite networks to 
enable coordination with exist- 
ing systems. That information 


for the 20 largest constellations come from 
nine nations and include well-established 
satellite companies, start-ups, and govern- 
ment agencies. 

The growth of objects in LEO has con- 
sequences. The environment already con- 
tains considerable mass from thousands 
of operational satellites and tens of thou- 
sands of pieces of tracked debris, including 
defunct satellites and abandoned rocket 
bodies. It is estimated that there are mil- 
lions of pieces of smaller, untracked, and 
potentially dangerous debris (4). The addi- 
tion of hundreds of thousands of new satel- 
lites would greatly increase the complexity 


A surge in satellites 
Cumulative growth in the number of satellites filed in the International 
Telecommunication Union (ITU) “As Received” database from 1 January 2017 to 


is examined here. 31 December 2022. 
Using the ITU’s “as re- 
ceived” database, we extracted 1000 


filings for constellations with 
10 or more satellites and 
found that more than 300 


constellations representing cy auy 
more than 1 million satellites S 
were filed between 1 January 3 
2017 and 31 December 2022. *% 600 
This is greater than 115 times 3 
the number of operational 3 
satellites currently in orbit. 5 400 
Moreover, among these 300 ‘6 
constellations, there are more 3 
than 90 that comprise more — 
than 1000 satellites each. = 200 


Twenty-three have more than 
5000 satellites, and eight have 
more than 10,000 satellites. 
The largest single filing is 
Cinnamon-937, with 337,320 
satellites. For all of these 
numbers, we have attempted to exclude re- 
peated filings for the same satellites. The 
cumulative number of satellites filed over 
the examined period is shown in the fig- 
ure, with most of these satellites destined 
for LEO. 

Most of the proposed constellations 
were filed by China (about 65) and the 
United States (about 45); however, very 
large (more than 10,000 satellites) constel- 
lations have also been filed by Rwanda, 
Germany, Spain, Norway, France, and 
Solomon Islands. The “operating agencies” 
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of operations and the risk of on-orbit colli- 
sions. Moreover, reentries from medium-to- 
large satellites and the rocket bodies used 
for launches would pose a growing risk to 
people on the ground, at sea, and in aircraft 
(5). Light reflecting off satellites would con- 
tinue to disrupt astronomy through streaks 
and glints, and radio astronomy would be- 
come further limited by transmissions and 
electronics noise (6). 

Most likely, many of the filed satellites 
will never be launched. There are many 
reasons why satellite projects do not come 
to fruition, including funding problems, 
withdrawals of political support, and engi- 
neering issues. In some cases, large filings 
could be a calculated move by governments 
or companies: applying for more satellites 
than they intend to launch, possibly with 
an aim to attract investors or sell the rights 
to spectrum. In other cases, a company 


2022 


might engage in “spectrum warehousing” 
by acquiring spectrum rights that it hopes 
to use later, as technology develops, cus- 
tomer demand increases, or additional in- 
vestors are found. 

There are past examples of over-filing 
for satellites. In the 1980s and 1990s, a 
company called Tongasat, operating on 
behalf of the Government of Tonga, filed 
for geosynchronous orbit (GSO) slots and 
began leasing them to foreign satellite 
companies (7). Cases of possible over- 
filing have also occurred regarding LEO, 
including in 2015 when large filings were 
submitted for OneWeb, SpaceX, Thales 
Alenia Space, Telesat, and 
KleoConnect within months 
of each other (8). 

Regardless of the reason, 
over-filing hampers our ability 
to identify and address poten- ~ 
tially catastrophic problems 
in a timely manner. The chal- 
lenge is further complicated by 
other industry practices. Some 
companies split their satellite 
constellation across multiple 
filings with the ITU. For ex- 
ample, SpaceX’s Starlink Gen2 
constellation was submitted 
across approximately 22 fil- 
ings. This presents a challenge 
to the ITU’s efforts to protect 
GSO satellites against harmful 
radio interference from satel- 
lites in LEO, including through 
limits on equivalent power-flux 
density (EPFD), a measure of 
transmission power. It was al- 
ready difficult for the ITU to 
accurately model EPFD emis- 
sions from non-GSO satellites 
and evaluate compliance with 
EPFD limits under Article 22 of the Radio 
Regulations; splitting constellations across 
multiple filings only exacerbates the prob- 
lem (9). The ITU has identified that one 
reason why operators split filings is to ob- 
tain favourable EPFD results that would 
otherwise exceed ITU EPFD limits if they 
were submitted as a single constellation 
(10). Although it advises against this prac- 
tice, the ITU has not yet prohibited or taken 
steps to disincentivize it. 

In several cases, companies have had 
several nations file for the same constel- 
lation. Three nations—Norway, Germany, 
and the United States—submitted filings 
for SpaceX, and three others—the United 
Kingdom, France, and Mexico—submitted 
filings for OneWeb. Papua New Guinea 
has submitted filings for the US company 
Omnispace, and Solomon Islands has sub- 
mitted filings for the Australian company 


2023 
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Fleet Space. Media reports linked the 
Cinnamon-937 filings, filed by the govern- 
ment of Rwanda, to a French company 
named E-Space, founded by American en- 
trepreneur Greg Wyler (/1). In June 2023, 
the French government made a new filing 
for E-Space, a 116,640-satellite constella- 
tion named Semaphore-C; as a result, the 
actual extent of E-Space satellite constel- 
lations is unclear, although the numbers 
remain substantial (12). It is not clear why 
Rwanda submitted filings for E-Space, al- 
though Wyler’s previous links to the nation 
provide a possible explanation (13). 

Different nations will have different 
policies concerning ITU filings, including 
the fees charged, the degree of scrutiny 
applied, and the transparency (or obfusca- 
tion) provided with respect to the compa- 
nies involved. Companies likely consider 
these policies when deciding where to file, 
creating the possibility of flags of conve- 
nience. In the maritime domain, flag-of- 
convenience nations provide shipping 
companies with registrations for their ves- 
sels, as is required by international law, 
but do so with minimal regulation and lax 
enforcement. Unsurprisingly, these ships 
have poor safety records. At the same 
time, we do not assume or intend to imply 
that nations such as Rwanda, Papua New 
Guinea, and Solomon Islands are acting 
in bad faith. The accessibility of the ITU’s 
processes for securing radiofrequency 
spectrum offers nations a way into this 
fast-growing high-tech sector. 


CHANGING THE RULES 

If over-filing is a problem, there is a solu- 
tion available. The ITU is an effective law- 
making body through which its 193 mem- 
ber states can readily create or update rules 
that then bind them all. By ratifying the ITU 
Convention and Constitution, nations agree 
to further law-making at Plenipotentiary 
Conferences held every 4 years, and at 
World Radiocommunication Conferences 
(WRCs) held every 3 to 6 years, at which 
ITU member states review and revise the 
Radio Regulations. They do so through a 
consensus-based process that further en- 
sures that no nation is bound without its 
consent. This ability to create new rules 
without requiring a new and lengthy round 
of treaty ratifications is unusual in interna- 
tional relations although not unique, with 
the International Maritime Organization 
also able to adopt new, widely applicable 
rules with some regularity. 

The ITU does not have enforcement pow- 
ers as such; it relies on national regulators 
to ensure compliance. The ITU’s primary 
responsibility—and those of the member 
states, acting together—is to keep the rules 
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abreast of new technological developments. 
Due diligence rules were first established 
at the 1997 WRC (WRC-97) in part to re- 
duce speculative filings. These require gov- 
ernments to submit information such as 
the planned spacecraft manufacturer and 
launch provider to the ITU, but they do not 
apply to all spectrum bands. Separately, in 
1997, the ITU Council introduced a cost re- 
covery scheme that eventually had the ef- 
fect of partially curbing speculative filings 
in GSO, although proposals for fees used 
explicitly to discourage over-filing were re- 
jected by member states (J4). At the most 
recent WRC, WRC-19, the first rules to ad- 
dress the over-filing of constellations were 
adopted, including a “milestones” approach 
in which operators must launch specific por- 
tions of their constellation within certain 
time periods after the initial filing. National 
regulators are implementing these rules: 
For example, Amazon’s Project Kuiper li- 
cense from the US Federal Communications 
Commission requires that they have at least 
half of their satellite constellation in orbit 
and operating by July 2026, or they risk los- 
ing some spectrum rights. However, at least 
in terms of deterring speculative filings, 
the milestones approach does not appear 
to have been particularly effective, with 
our data showing that filings for more than 
900,000 satellites have been made since it 
was adopted. 

Agenda items for WRC-23 to begin in late 
November 2023 show an awareness of the 
challenges presented by large satellite con- 
stellations and could lead to progressive 
changes to the Radio Regulations; in other 
words, up-to-date rules. Post-milestones 
reporting is being explored, which would 
require nations to report major satellite 
failures so that the ITU has accurate data. 
Technical solutions are being developed 
to accurately model aggregate EPFD pro- 
duced by all non-GSO satellites, and these 
will help the ITU to evaluate whether con- 
stellations are complying with EPFD lim- 
its. Consideration is also being given to 
limiting the allowed altitude deviations of 
non-GSO satellites to facilitate regulatory 
control and prevent interference (9). Other 
sustainability issues, such as orbital con- 
gestion and light and radio pollution from 
satellites, might indirectly benefit if these 
updates were made. 

Although the ITU is paying attention to 
at least some of the issues emerging with 
respect to large constellations, the rapid 
growth in the cumulative number of satel- 
lites being filed remains a concern. Better 
rules will have to be adopted, either at 
WRC-23 or at the next WRC in 2026-2027. 
These might include some kind of limits on 
the number of satellites in individual con- 


stellations or in certain highly desirable 
orbits. Other options include higher fees 
for large filings or bonds that are repaid 
after satellites are deorbitted. The possibil- 
ity of flags of convenience should also be 
addressed, along with the possibility that 
larger constellation operators may be filing 
for speculative reasons. The agenda for that 
next conference will be adopted at WRC- 
23, making negotiations over the next few 
months critically important. 

Satellite constellations offer great ben- 
efits to society, but their unchecked pro- 
liferation threatens everyone’s interests, 
including astronomers, scientists that rely 
on Earth imaging satellites, and all other 
space users. It is time for the ITU and its 
member states to step up: Their author- 
ity over the radio spectrum makes them 
distinctly able to manage LEO, a finite 
resource that belongs—and should remain ~ 
accessible—to all humankind. 
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CLASSICS REVISITED 


Galileo's comet rebuttal 


A foundational text that articulated the modern 


scientific method turns 400 


By Alex Gomez-Marin 


his October marks the 400th an- 
niversary of the publication of The 
Assayer Ul Saggiatore, in Italian) by 
Galileo Galilei (1564-1642). A treatise 
intended to address a major contro- 
versy on the nature of three comets 
observed in Europe in 1618, the book came 
to represent a major landmark in the his- 
tory of science. Contained in Gali- 
leo’s discussions of “science as a 
method of demonstration and rea- 
soning capable of human pursuit” 
is an articulation of the scientific 
method as we now know it. 
Some have “advanced ridiculous 
and impossible opinions against 


99 


unexpected and uncalled-for treatment,” 
Galileo continues, “I break my previous re- 
solve to publish no more.” 

The Assayer was written as a letter to 
Virginio Cesarini, an Italian poet and cham- 
berlain to Pope Gregory XV and his succes- 
sor Pope Urban VIII, to whom Galileo dedi- 
cated the book and under whose pontificate 
the astronomer was trialed 10 years later. It 
contains, among other prescient observa- 
tions, a revelatory passage about 
the mathematical intelligibility 
of nature. “Philosophy is written 
in this grand book, the universe, 
which stands continually open to 
our gaze,” Galileo writes. “But the 
book cannot be understood unless 
one first learns to comprehend the 


me,” writes Galileo, dramatically, language and read the letters in 
in the book’s opening, referring The Assayer which it is composed. It is written 
specifically to criticisms leveraged Galileo Galilei in the language of mathematics, and 
by the Jesuit priest and mathema- 1623 its characters are triangles, circles, 


tician Orazio Grassi. Writing under 

the pseudonym Sarsi, in 1619 Grassi had pub- 
lished Libra Astronomica ac Philosophica, 
which sought to refute Galileo’s claim that 
the comets could be visual illusions rather 
than celestial bodies. (Grassi was later 
proven right.) Thus, “forced to act by this 
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and other geometric figures without 
which it is humanly impossible to understand 
a single word of it; without these, one wan- 
ders about in a dark labyrinth.” 

Galileo proceeds to brilliantly expound 
the primacy of empirical observations, 
which, coupled with precise mathematical 
definitions, can offer testable predictions 
to decide what to believe in our quest for 
truth. One must, however, beware of mis- 
characterizing him, as Galileo was not ad- 


A disagreement about the nature of three comet: 
observed in 1618 led Galileo to draft this treatise. 


vocating for a cold and detached science. 
He was mindful of the limits of sense and 
reasoning and supplemented his great ab- 
straction and observation skills with intu- 
ition and imagination. 

Galileo also stamped another indel- 
ible idea in The Assayer, one with conse- 
quences for the study of mind and matter 
that have percolated through the centuries 
and continue to affect 21st-century con- 
sciousness studies. Toward the end of the 
book, he writes: “Without the senses as 
our guides, reason or imagination unaided 
would probably never arrive at quali- 
ties like these. Hence I think that tastes, 
odors, colors, and so on are no more than 
mere names so far as the object in which 


4 
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we place them is concerned, and that they ~ 


reside only in consciousness.” Galileo con- 
trasts the “primary phenomena of motion 
and touch” with sensations, which he re- 
gards as “secondary qualities” with “no 
real existence save in us.” Having program- 
matically excluded subjective experience 
from the purview of science, we are still 
struggling to untie the Galilean knot that 
is the hard problem of consciousness. 

The book is also an example of epistemic 
humility in the face of dogmatic opinions, 
appeals to authority, and the profligacy of 
logical argumentation. “[W]e must resort 
to experiments for settling such questions,” 
Galileo insists, and, when that is not pos- 
sible, “suspend our argument and wait 
quietly until some new comet came along.” 
Referring to nature, he remarks that “she 
employs means we could never think of 
without our senses and our experiences 
to teach them to us—and sometimes even 
these are insufficient to remedy our lack of 
understanding.” 

The abridged English translation of 
The Assayer by the late American historian 
of science Stillman Drake—a world author- 
ity on Galileo—can be found online (7) and 
read easily in the span of an hour. The 
original Italian version is considerably lon- 
ger, beautifully illustrated, and can be read 
online too (2). Those who take the time to 
do so will be rewarded with insight into the 
great astronomer’s singular character and 
seminal ideas. @ 
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Freedom from free will 


A scientist presents a case for a predetermined future 


By Uri Maoz 


e all sometimes behave in ways 
that seem to conflict with our goals 
and intentions. One person might 
struggle to resist a favorite comfort 
food despite knowing that a differ- 
ent option will be more nutritious. 
Another may repeatedly snooze their alarm 
and miss their morning workout. Still others 
may wish to spend more time with family but 
instead find themselves mindlessly browsing 
social media. If we only had more willpower, 
the conventional wisdom goes, we could eat 
healthier, exercise regularly, and spend more 
time with loved ones. In Determined: A Sci- 
ence of Life without Free Will, neuroscientist 
Robert Sapolsky argues that we have no free 
will and that such choices are thus actually 
determined by factors beyond our control. 

Our genetic and epigenetic makeup, to- 
gether with environmental and cultural 
factors, make us who we are and determine 
how we behave, Sapolsky claims. Whether a 
person makes it to the gym or opts for a dec- 
adent dessert is little more than a function 
of the architecture of their prefrontal cortex 
and the neurotransmitters and hormones 
circulating in their body, all of which in turn 
are physical manifestations of a person’s ge- 
netic inheritance and the environment in 
which they spent their formative years. 

This claim is not new. It is a version of 
the “consequence argument,’ popularized 
by philosopher Peter van Inwagen in 1983, 
which states that in a deterministic uni- 
verse—one where all events are completely 
determined by an initial state and the laws 
of nature—all human actions are a conse- 
quence of the laws of nature and events in 
the remote past and are therefore not under 
our control. Nevertheless, Sapolsky’s cover- 
age of the relevant science is first-rate and 
very much worth reading. 

The book begins with the neuroscience 
of volition, following the now-famous ex- 
periment conducted by Benjamin Libet in 
1983, which purported to show that the hu- 
man brain registers the decision to move 
before an individual consciously decides to 
do so. Sapolsky then strives to demonstrate 
how various factors—many of them uncon- 
scious—influence our intentions and ac- 
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tions. He describes scientific evidence that 
those influences may occur a few seconds 
before our actions. Exposure to a disgusting 
smell, for example, can make some people 
less accepting of gay marriage (7). But such 
influences may also occur minutes, hours, 
or days before. The influences may even 
begin years earlier—childhood stress can 
disrupt impulse control in adulthood (2)— 
or centuries before a particular behavior 
occurs. Individuals raised in a collectivist 
culture have been found to avoid obstacles 


Determined: 

A Science of Life 

without Free Will 

Robert M. Sapolsky 

Penguin Press, 2023. 528 pp. 


world that lacks free will. If we accept that 
humans are not responsible for their mis- 
deeds, will some of us feel emboldened to 
behave badly? He thinks not, citing epilepsy 
as a case in point. Once thought to be the re- 
sult of demonic possession, it is now under- 
stood as a neurological disorder. In addition 
to leading to the development of effective 
treatments, this framework helped us to re- 
think to what degree an individual under- 
going an epileptic seizure is responsible for 
any potential negative outcomes. 


Factors beyond our control determine how we behave, argues Sapolsky. 


when walking, for example, whereas those 
raised in individualistic cultures physically 
remove the obstacles (3, 4). 

Sapolsky’s decades of experience study- 
ing the effects of the interplay of genes and 
the environment on behavior shine brightly 
in these discussions. In particular, he ar- 
gues against the claim that “luck” evens 
out over time, with fortune and misfortune 
striking most people in equal measure over 
the years, an idea favored by philosopher 
Daniel Dennett and others. Instead, he pro- 
vides compelling examples that bad luck 
compounds, meaning that many who are 
born “unlucky” have little chance of getting 
ahead. In later chapters, he convincingly 
argues against claims that chaos theory, 
emergent phenomena, or the indetermin- 
ism offered by quantum mechanics provide 
the gap required for free will to exist. 

Sapolsky spends the second half of the 
book grappling with the consequences of a 


Although he is careful not to conflate 
determinism with the inability to affect 
change in the world, Sapolsky’s dismissive 
attitude toward how determinism might 
be compatible with free will is one of the 
book’s weak points. Indeed, he sets the bar ~ 
very high for free will (“Show me a neuron 
being a causeless cause”). This well-written 
book is nonetheless worth reading. Better 
yet, pair it with Kevin Mitchell’s book Free 
Agents: How Evolution Gave Us Free Will, 
also publishing in October 2023, which 
makes the opposite argument, and then de- 
cide for yourself whether you had a choice 
to do so or it was all predetermined. 
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Iranians gather in the Hamun Lake basin to protest the government’s policies for preserving the lake, which has completely dried up. 
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Social consequences 
of lran’s water crisis 


Iran is facing a water crisis (7). The country 
is located in a dry region and has suffered 
from extended drought (2). Unsustainable 
water practices and climate change have 
exacerbated the shortages (2). Iran must 
address the crisis to curtail the social con- 
sequences of water scarcity. 

The water crisis has led to decreased 
quality of life across the country. More 
than 90% of water consumption in Iran 
is devoted to agriculture (3), and reduced 
water availability has led to decreased crop 
yields (3), which has caused food deficien- 
cies, higher prices, and financial damages 
for farmers (2). Emigration from rural 
areas, in which agriculture is the main 
source of income, has become more com- 
mon (4). The reduction of water resources 
has also increased water-related diseases 
such as cholera, typhoid, and diarrhea 
(5). The increased load has weakened the 
healthcare system and increased the mor- 
tality rates in some areas (6). In addition, 
the water crisis has caused social unrest in 
some parts of the country (4). 

To overcome the crisis, Iran’s govern- 
ment needs to improve water use effi- 
ciency, management, and sustainability. 
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Modern irrigation methods, such as drip 
irrigation, can decrease agricultural water 
loss (7). The government can adopt effi- 
cient farming strategies where possible, as 
well as incentivize independent farmers 
to apply such techniques (8). Iran should 
also invest in infrastructure such as dams, 
canals, and water treatment plants (9) and 
regulate the withdrawal of groundwater 
(10). Given the effects of climate change, 
industrial development plans should 
anticipate increasing challenges in water 
availability. Legislative bodies should con- 
sider the potential advantages of moving 
water-intensive industries from inland to 
the coast of the Gulf of Oman, where sea- 
water can replace freshwater for purposes 
such as cooling (71), as part of long-term 
water management plans. Finally, Iran 
should work to increase public awareness 
about water conservation. The government 
should fund campaigns to educate the 
people about the impact of water on their 
lives through television, radio, and social 
media (12). By taking these measures, Iran 
can reduce the effects of the water crisis 
and guarantee its future water supply. 
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Rethinking the design of 
Saudi Arabia’s linear city 


To accommodate a staggering population of 
9 million people within a compact area of 
34 km? (1, 2), Saudi Arabia has proposed an 
ambitious urban project: the construction 
of a new city near the Red Sea. The design 
of the city, known as “The Line,” includes 
500-meter-tall skyscrapers positioned in two 
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rows spanning 170 km in length (/, 2). Saudi 
Arabia is touting the city, with its high popu- 
lation density, use of renewable energies 
and artificial intelligence, and exclusion of 
automobiles, as an example of sustainability 
(1, 2). However, regardless of its carbon foot- 
print, the plan will likely be detrimental to 
wildlife. Saudi Arabia should instead adopt 
an environmentally friendly design. 

The habitat fragmentation and loss of 
population connectivity in the planned 
construction area pose substantial threats 
to global biodiversity (3). “The Line” is sur- 
rounded by areas of singular importance 
for the biodiversity of hyperarid ecosystems, 
including rare species sites (4) and Key 
Biodiversity Areas (5). Moreover, the city 
cuts through intact wilderness areas (4) 
and one of the country’s proposed protected 
areas (6). It is also in close proximity to a 
potential wildlife corridor (4). 

“The Line” could put migratory birds, 
which frequently fly through the region (7), 
at heightened risk as well. Skyscrapers sur- 
rounded by glass facades and gardens within 
a desert area could affect birds’ behavior (8). 
Birds attracted by woodland areas or night- 
time lights may suffer from collisions with 
glass surfaces, which already cause hundreds 
of millions of bird fatalities each year in the 
United States and Canada (9). If birds avoid 
light (JO), the city could alter their migration 
routes by hundreds of kilometers. 

Urban planners have argued that using 
a circular instead of linear design would 
enhance the city’s habitability (2). Although 
any large city would present an obstacle to 
some wildlife movement and a potential 
distraction to migrating birds, a circular or 
square fractal shape would allow wildlife 
to circumvent the area more easily. Saudi 
Arabia should revise its plans to facilitate 
the creation of an ecological network that 
promotes connectivity (//, 12). 

Promoting biodiversity requires a holistic 
approach derived from interdisciplinary 
collaboration between landscapers and 
biologists. Before construction begins, Saudi 
Arabia should conduct a biodiversity assess- 
ment. The plan should then be revised to 
remove barriers and add corridors for land 
animals; incorporate ponds, streams, and 
birder feeders to support migratory birds; 
and integrate native plants and preserve nat- 
ural habitats within the city, including trees, 
shrubs, and ground cover. In addition, light 
pollution should be minimized to mitigate 
disruption to nocturnal species. By incor- 
porating these features, Saudi Arabia can 
improve the city’s sustainability, benefiting 
both human inhabitants and biodiversity. 
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Protect central Chile’s 
biodiversity 


The recently created Santiago Glaciers 
National Park covers 75,000 ha in the Andes 
Mountains in central Chile (7), about 60 km 
east of the capital city Santiago, which is 
home to 40% of the country’s population. 
The park, located in the hotspot of biodiver- 
sity that covers all of central Chile (2), will 
protect several endangered species, includ- 
ing the Andean cat (Leopardus jacobita) (3), 
and 368 glaciers (1), which are crucial water 
reserves for the populations located down- 
stream. However, to provide adequate pro- 
tection to the region, Chile should expand 
the park from its current limit of 3600 m (J) 
down to an altitude of 1000 m. 

Given that Chile has the world’s 16th 
highest water shortage, a result of climate 
change and ever-growing demand (4, 5), 
protecting the central region should be a 
priority. However, the park covers only half 
of the 142,000 ha that citizen organiza- 
tions requested (6). The 67,000 ha below its 
border support high biodiversity and are 
severely threatened by mining (7) and other 
human activities that can increase the risk of 
wildfires (8). Extending the park by 67,000 
ha, to include all areas above 1000 m, would 
greatly expand its protection of biodiversity 
and associated ecosystem services. By facili- 
tating ecotourism and restricting mining 
and construction, an expanded park would 
also increase access to nature for the large 


urban population in nearby cities. 

As a Signatory to the Kunming-Montreal 
Global Biodiversity Framework (9), Chile 
has committed to protecting 30% of its ter- 
restrial ecosystems. However, most of the 
21.5% of the country’s land that is already 
part of the National System of State- 
Protected Areas (SNASPE) is in Patagonia 
(10) in southern Chile. Despite the many 
threatened ecosystems in the central biodi- 
versity hotspot (J), only a small fraction of 
the area is protected by the SNASPE (2, 6), 
including the new park. 

The “30 by 30” goal must focus on effec- 
tively protecting Key Biodiversity Areas 
rather than simply reflecting the aver- 
age area protected across a country (72). 
Opportunities for protecting biodiversity 
and maintaining key ecosystem services 
such as water provision, especially close to 
densely populated regions, must not be over- 
looked. Expanding the park’s protections to 
lower altitudes would bring Chile closer to 
its goals, protect water resources, protect 
vulnerable species, and facilitate ecotourism. 
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Stacking fault mosaic 


generally expected. —BG Science, adj3974, this issue p. 185 


NEUROSCIENCE 
Synapse in the making 


Presynaptic terminals are 
incredibly specialized areas 

in neurons that are formed de 
novo during differentiation. The 
mechanisms mediating the 
transport of presynaptic com- 
ponents to the terminals remain 
to be elucidated. Rizalar et al. 
showed that phosphatidylinosi- 
tol 3,5-bisphosphate—mediated 
signaling guides the transport 


of synaptic vesicles and active 
zone proteins along the axon to 
the presynaptic site in precursor 
vesicles (see the Perspective 

by Rivero-Rios and Weisman). 
Their results provide a critical 
piece of the puzzle necessary 
for understanding the formation 
of the machinery responsible 
for neuron-to-neuron communi- 
cation. —MMa 


Science, adg1075, this issue p. 223; 
see alSo adk5037, p. 155 
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etal alloys often get harder as you strain them because of the creation and entangle- 
ment of line defects. Pan et al. found a different mechanism for this strain hardening in 
a multi—principal element alloy at liquid nitrogen temperature. The alloy forms a mosaic 
of small stacking faults oriented in different directions instead of linear defects. These 
faults generate exceptional strain hardening in an alloy type where this behavior is not 


CHEMISTRY 

Cobalt’s persistence 

pays off 

Ruthenium and iridium com- 
plexes are advantageously 
tunable photoredox catalysts, but 
the expense of these precious 
metals is a drawback to their use. 
Lighter, more abundant metals 
have been considered unsuitable 
because of their anticipated rapid 
relaxation from photoexcited 


4 


states. Chan et al. report that Ste 


cobalt complex with conventisna 
bipyridyl ligands manifests a 
surprisingly long excited state 
lifetime, which they attribute 
to Marcus inverted region 
behavior (see the Perspective 
by Yaltseva and Wenger). The 
Earth-abundant metal is effective 
at photoredox coupling of aryl 
amides with aryl boronic acids 
and more generally opens the 
door to greater sustainability in 
this catalyst class. —JSY 

Science, adjO612, this issue p. 191; 

see also adk5923, p.153 


NEUROIMMUNOLOGY 
Adult antiviral advantage 


Systemic pathogens may invade 
the central nervous system 
(CNS), which is particularly 
dangerous for newborns, and the 
mechanisms providing protec- 
tion are not well understood. Kim 
et al.determined that the struc-  °¢ 
tural barriers protecting the CNS 
are similar between neonatal 
and adult mice, but the mac- 
rophages localized in the dura ‘ 
mater, the connective tissue that 
surrounds the brain, were dif- 
ferent. The authors showed that 
adult mice have a population of 
macrophages that are localized 
along the venous sinuses and 
produce antiviral molecules and 
chemokines that protect against 
systemic viruses, but these cells 
are rare in neonatal mice. These. 
findings emphasize the role of C 
immune cells in regulating viral 
entry into the CNS and highlight 
that immature immune cells, 
rather than underdeveloped ana- 
tomical barriers, may contribute 
to neonatal susceptibility to CNS 
infection. —SHR 
Sci. Immunol. (2023) 
10.1126/sciimmunol.adg6155 


MEMBRANES 
Reaction control using ice 


Polyamide membranes are 
widely used for desalination and 
are commonly made using an 
interfacial polymerization pro- 
cess. Zhang et al. modified this 
process by using an m-phenyl- 
enediamine (MPD) monomer 
frozen in ice and in contact with 
a trimesoyl chloride-hexane 


science.org SCIENCE 


PHOTOS: (TOP) T.W. VAN URK/SHUTTERSTOCK; (BOTTOM) MAX PLANCK/INSTITUTE FOR SOLID STATE RESEARCH 


solution. The use of ice chem- 
istry creates more molecular 
space within the material 
because the melting allows for 
control over the diffusion and 
reaction rates of MPD. This 
leads to a higher water flux and 
enhanced transport of smaller 
ions relative to larger ones, such 
as the separation of chloride 
from sulfate. —MSL 

Science, adi9531, this issue p. 202 


Targeted inhibition 
Cutaneous squamous cell 
carcinoma (CSCC) is the second 
most common skin cancer, 
and it often progresses from 
preneoplastic actinic keratosis. 
Current modalities of treatment 
for actinic keratosis to prevent 
progression to CSCC are limited 
by toxicity, So new options 
are required. Sarin et al. have 
developed a topical inhibitor 
of the kinase MEK, NFX-179, 
that is rapidly metabolized to 
avoid systemic toxicity. The 
authors tested NFX-179 in two 
mouse models and found that 
it prevented cSCC development 
but was nontoxic, suggesting 
further evaluation of this gel for 
chemoprevention. —DLH 
Sci. Transl. Med. (2023) 
10.1126/scitranslmed.ade1844 


A tip’s eye view 

of glycoconjugates 

Many proteins, especially those 
that are secreted from eukaryotic 
cells, have sugar chains attached 


to facilitate quality control or 
mediate protein-protein or 


cell-cell interactions. These 
Sugars are often complex and 
heterogeneous and can be 
challenging to study by conven- 
tional structural or biophysical 
methods. Anggara et al. show 
that glycans attached to pep- 
tides and lipids can be imaged 
directly using single-molecule 
atomic force microscopy. These 
biomolecules can be applied to a 
surface by a gentle electrospray 
deposition and, if necessary, 
manipulated to stretch out 
structured regions. The authors 
observed distinct glycan con- 
figurations and imaged large 
fragments of proteins, including 
a densely glycosylated mucin. 
—MAF 

Science, adh3856, this issue p. 219 


Big but sensitive 
to the environment 


Environments are responding to 
human-induced climate warm- 
ing in a variety of ways, not all of 
them expected. Such changes 
can have large impacts on spe- 
cies and ecosystems. Responses 
to such changes may be most 
obvious in shorter-lived species, 
but Stewart et al. show that even 
some of the largest animals 
on the planet are susceptible 
to relatively minor changes 
(see the Perspective by Read). 
Specifically, they looked across a 
50-year database on gray whale 
population estimates and found 
clear evidence of rapid popula- 
tion increases and declines 
in response to changing prey 
biomass and ice cover. —SNV 
Science, adi1847, this issue p. 207; 
see also adk4244, p.159 


Scanning tunneling microscopy image of a single mucin protein decorated 
with glycans that branch off from the linear polypeptide chain 


SCIENCE science.org 


Lily bulbs have a complex 
biochemical mechanism for 
maintaining dormancy 
through winter. 


PLANT SCIENCE 


Edited by Caroline Ash 
and Jesse Smith 


Waking up from the cold 


ily bulbs stay dormant during cold weather and only 

resume growth and flowering once conditions warm 

up. Pan et al. found that a protein called NUCLEAR 

FACTOR-YA (NFYA) helps control the transition to 

flowering by recruiting an epigenetic silencing complex. 
The silencing complex down-regulates a callose synthase 
enzyme, allowing pores that connect cells (plasmodesmata) 
to open. Opening of the plasmodesmata facilitates intercel- 
lular communication, allowing reactivation of growth and 
flowering. The authors found that expression of V/L1, an ele- 
ment of the silencing complex, correlated with the timing of 
dormancy breaking across different lily cultivars. V/L1 could 
therefore be used as a marker to assist lily breeding efforts, 
reducing the chances of generating cultivars that break dor- 


mancy too soon. —MRS 


Nat. Plants (2023) 10.1038/s41477-023-01492-z 


Correcting the record 
of acold case 


In 2012, the low-coverage genome 
of the Tyrolean Iceman, or Otzi, 
was published. The Iceman is 

one of the oldest human glacier 
mummies originally discovered 

in the Alps, likely living between 
3350 and 3120 BCE. Wang et al. 
resequenced this individual using 
the latest sequencing techniques 
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and found that recent human 
contamination introduced error 
into the original study. In contrast 
to the earlier work, their results 
showed that Otzi had no Steppe- 
related ancestry, and probably 
came from an isolated Alpine 
community more closely related 
to Anatolian farmers, with some 
gene flow from hunter gatherers. 
—CNS 
Cell Genom. (2023) 
10.1016/j.xgen.2023.100377 
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CORONAVIRUS 
Adapting booster 
vaccines to variants 


Despite the successes of the 
primary series of COVID-19 
vaccines, waning immune pro- 
tection and immune evasion by 
variants of severe acute respira- 
tory syndrome coronavirus 2 
(SARS-CoV-2) have prompted 
the need for booster immuniza- 
tions. However, choosing the 
optimal booster regimen and 
the viral variants with which to 
update the vaccines is a chal- 
lenging task that is complicated 
by the heterogeneity of popula- 
tion immunity. In a Perspective, 
Krammer and Ellebedy discuss 
these challenges and analyze the 
evidence showing that contin- 
ued inclusion of the ancestral 
SARS-CoV-2 strains in booster 
vaccines is highly question- 
able. The authors highlight the 
importance of booster vaccina- 
tions using the most recent virus 
variants, but emphasize that 
there remain many questions 
about finding the best solutions 
for updating COVID-19 vaccines. 
—GKA 

Science, adh2712, this issue p. 157 


SYNTHETIC BIOLOGY 
Probiotic-guided 
CART cells 


Immunotherapy has proven 
highly efficacious for certain 
types of blood cancers, but the 
lower success rates for solid 
tumors remain a challenge. 
Vincent et al. designed probi- 
otics that could home in and 
colonize solid tumors to improve 
chimeric antigen receptor 
(CAR) T cell immunotherapy. 
The two-step approach involved 
engineering a nonpathogenic 
strain of Escherichia coli, which 
delivered synthetic antigens to 
the tumor microenvironment 
and “tagged” the tumor (see 

the Perspective by Bressler and 
Wong). They next generated CAR 
T cells that were programmed 

to recognize these synthetic 
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antigen tags. When the E. coli 
probiotic was administered, the 
CART cells could be directed 
to the solid tumors, where they 
orchestrated tumor cell killing in 
experimental models of breast 
and colon cancer. —PNK 

Science, add7034, this issue p. 211; 

see alSo adk6098, p. 154 


PHYSICAL CHEMISTRY 
Spin selectivity 
crossing a bridge 
Chirality-induced spin selectivity 
has undergone intensive study in 
the two decades since its discov- 
ery. Essentially, the phenomenon 
manifests as polarization of elec- 
tron spin by chiral molecules, 
although the observations thus 
far have pertained to samples 
adsorbed on a solid substrate. 
Eckvahl et al. report significant 
chirality-induced spin selectivity 
signatures during intramolecular 
electron transfer between donor 
and acceptor fragments across 
a chiral bridge in free-floating 
molecules (see the Perspective 
by Subotnik). The precise tun- 
ability and tractability of these 
systems should enable system- 
atic comparisons with evolving 
theoretical models. —JSY 
Science, adj5328, this issue p. 197; 
see also adk5634, p. 160 


IMMUNOLOGY 
Source of IFN 


signal strength 

Type | and III interferons (IFNs) 
act through distinct receptors 
but activate the same down- 
stream kinases and effectors. 
Using chimeric receptors, 
Mesev et al. investigated why 
type | receptors signal more 
strongly than type Ill receptors. 
In response to ligand, chimeric 
receptors based on type | IFN 
receptors exhibited greater 
signal strength, which was due 
to a short kinase-binding motif 
that is not found in type III recep- 
tors. These results suggest that 
the intracellular domains of type 


| and Ill IFN receptors encode 
signal strength and facilitate dif- 
ferential responses. —JFF 
Sci. Signal. (2023) 
10.1126/scisignal.adf5494 
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and north Atlantic islands reflects 
patterns of past human activity. 


STEM EDUCATION 
Redesigning 

qualifying exams 

Multiple gatekeepers exist in 
graduate STEM education. 
Doctoral qualifying exams 
remain an understudied example 
despite their implications for 
degree completion, equity, 

and student well-being. Liera 

et al. applied concepts from 
sociocultural theory to better 
understand how faculty and 
students perceive qualifying 
exams at an individual level 

and how doctoral programs 
facilitate student learning at the 
organizational level. A compara- 
tive case study of two doctoral 
programs provided insights on 
different pathways for creating 
more inclusive and developmen- 
tally oriented qualifying exams. 
One illustrates how programs 
can reconceptualize candidacy 
requirements in terms of learn- 
ing scientific community norms, 
and the second highlights how 
programs can eliminate harm- 
ful, high-stakes exams while 
continuing to assess content 
knowledge. These results should 
encourage graduate programs 
to design and implement similar 


184 


structures that facilitate doctoral 
student learning. —MMc 

Phys. Rev. Phys. Educ. Res. (2023) 
10.1103 PhysRevPhysEducRes.19.020110 


HYDROGELS 
Using oxygen to make 
surfaces slippery 


Hydrogels, three-dimensional 
networks of cross-linked poly- 
mers that are highly swollen with 
water, are used to mimic soft bio- 
logical materials such as articular 
cartilage. The extent of cross- 
linking controls the mechanical 
and transport properties, so any 
approach to tuning the surface 
properties ideally would not 
affect the bulk material. Chau et 
al. assessed the effect of chang- 
ing oxygen environments on 

the synthesis of polyacrylamide 
hydrogels. Raising the oxygen 
content to 20 mole percent, 
which inhibits free radical polym- 
erization, leading to the formation 
of a thicker, preswollen surface 
gel layer that lowers the coef- 
ficient of friction by an order of 
magnitude without substantially 
influencing the elastic modulus. 
The authors developed a model 
to enable prediction of surface 
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tle diversity on Mediterranean 


ISLAND BIOGEOGRAPHY 


Drivers of diversity in 
dung beetles 


profiles based on reaction condi- 
tions. —MSL 
ACS Appl. Mater. Interfaces (2023) 
10.1021/acsami.3c04636 


PHYSICS 
Precisely measuring 
Earth’s rotation 


Earth's rotation is frequently 
perceived to be constant, yet 
minuscule variations resulting 
from the planet's nonspheri- 

cal shape and irregular mass 
distribution can be detected if 
measured with enough preci- 
sion. Previous measurements 
using interferometry and 
satellite systems suffer from 
instability and lack adequate 
sensitivity. Schreiber et al. 
devised an inertial platform 
using a large-ring laser single- 
component gyroscope in which 
two coherent light beams travel 
in opposite directions around a 
square closed loop. Under rota- 
tional motion, the cavity lengths 
are unequal, producing a beating 
superposition of the two laser 
beams which is proportional to 
the rotation rate. The system 
allows the determination of 
Earth's rotation with millisecond 


sland biodiversity depends on island size 
and the ability of organisms to disperse 
from the mainland, as well as the local 
environment and human activities. 

Tonelli et al. found that of these, island 
size and historical colonization by humans 
explains dung beetle diversity across the 
Macaronesian and Mediterranean islands. 
More species occur on larger islands but 

also on those that were colonized by people 
earlier (as far back as 18,000 years ago). 
Humans may have accidentally aided in dung 
beetle dispersal, in addition to introducing 
domesticated animals that produce copious 
dung. However, current human population 
density does not explain beetle diversity, 
suggesting that traditional rather than more 
modern human activities benefit these gener- 
alist species. —BEL 


J. Biogeogr. (2023) 10.1111/jbi.14715 


precision over 120 days of con- 

tinuous measurements. —ECF 
Nat. Photonics (2023) 
10.1038/s41566-023-01286-x 


CANCER 
How metastases hack 
the brain 


Brain metastases are often 
associated with cognitive impair- 
ments. However, predicting how 
these metastases will affect 
brain function remains an unmet 
challenge. Sanchez-Aguilera et al. 
applied computational modeling 
and machine learning to predict 
the effect of brain metastasis on 
brain function. Using in vivo elec- 
trophysiology, ex vivo calcium 
imaging, and transcriptomic data 
obtained in three rodent models 
of brain metastasis, the authors 
identified brain activity associ- 
ated with specific subtypes of 
metastasis. Although the trans- 
lational relevance remains to be 
seen, the results pave the way for 
the development of models able 
to predict the impact of brain 
metastasis on cognition based on 
electroencephalographic record- 
ings. —MMa 
Cancer Cell. (2023) 
10.1016/j.ccell.2023.07.010 
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Exploiting the Marcus inverted region for first-row 
transition metal-based photoredox catalysis 


Amy Y. Chan‘}, Atanu Ghosh?+, Jonathan T. Yarranton’, Jack Twilton’, Jian Jin’, 
Daniela M. Arias-Rotondo’, Holt A. Sakai’, James K. McCusker*, David W. C. MacMillan’* 


Second- and third-row transition metal complexes are widely employed in photocatalysis, whereas 
earth-abundant first-row transition metals have found only limited use because of the prohibitively fast 
decay of their excited states. We report an unforeseen reactivity mode for productive photocatalysis 
that uses cobalt polypyridyl complexes as photocatalysts by exploiting Marcus inverted region behavior 
that couples increases in excited-state energies with increased excited-state lifetimes. These cobalt (Ill) 
complexes can engage in bimolecular reactivity by virtue of their strong redox potentials and sufficiently long 
excited-state lifetimes, catalyzing oxidative C(sp2)-N coupling of aryl amides with challenging sterically 
hindered aryl boronic acids. More generally, the results imply that chromophores can be designed to increase 
excited-state lifetimes while simultaneously increasing excited-state energies, providing a pathway for the 
use of relatively abundant metals as photoredox catalysts. 


hotoredox catalysis has enabled previ- 

ously elusive transformations to access 

value-added products through the selec- 

tive activation of chemical bonds to gen- 

erate reactive radical intermediates (J, 2). 
A key component in the generation of these open- 
shell intermediates is the photocatalyst, typically 
a second- or third-row transition metal complex, 
such as a Ru(IJ) or Ird II) polypyridyl species, 
capable of absorbing visible light (3, 4). Upon 
visible-light excitation, these metal complexes 
generate a long-lived charge-transfer excited 
state with lifetimes on the order of microseconds 
(5-7). This longevity enables the excited state to 
engage in single-electron or energy transfer either 
with a transition metal catalyst or directly with 
an organic substrate (8, 9). 

Unfortunately, ruthenium and iridium are 
two of the least abundant elements in Earth’s 
crust. Although some engineered organic dyes 
have achieved comparable efficiency, tuning 
their redox windows requires the separate 
synthesis of each independent catalyst scaf- 
fold (0, 11). This lack of flexibility has pre- 
cluded the use of organic dyes as a complete 
replacement for transition metal-based photo- 
catalysts. By contrast, tuning the redox proper- 
ties of metal complexes is often as simple as 
altering the ligands around the metal center. 
The wealth of commercially available ligands 
and knowledge of their impact on the elec- 
tronic structure of the compound provides a 
synthetically accessible and predictibale man- 
ner in which to tune the redox properties of 
metal complexes. For this reason, there has 
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been considerable interest in exploiting this 
flexibility toward the development of cost- 
effective photocatalysts by using more earth- 
abundant first-row transition metals (72, 13). 


Limitations of first-row metal-based photocatalysis 


With few exceptions, little progress has been made 
toward this goal (14-16). The smaller ligand-field 
splitting endemic to first-row transition metals 
lowers the energy of the ligand-field states below 
that of the charge-transfer state (17). Upon visible- 
light excitation, these chromophores undergo 
rapid (subpicosecond) deactivation out of their 
charge-transfer manifolds (18, 19). Efforts to 
lengthen the charge-transfer lifetimes of first- 
row transition metal chromophores by desta- 
bilizing the ligand-field manifold with strong, 
o-donating ligands (20, 27) tend to require 
ligands that are challenging to design and syn- 
thesize (22, 23). An attractive complementary 
approach would be to instead leverage the en- 
ergy stored in these ligand-field excited states. 
In addition to the economic and environmental 
benefits of shifting to first-row metal complexes, 
elucidating the fundamental photophysics and 
photochemistry of these ligand-field excited 
states promises enhanced selectivity, as well as 
unlocking distinct chemical mechanisms and 
transformations (Fig. 1). 

The problem with this approach is caused by 
the shorter lifetimes typically encountered for 
ligand-field excited states and how these lifetimes 
trend with their free energies. For example, 
the excited-state lifetime of [Fe(tren(py)3)]°* 
[where tren(py)s is a hexadentate polypyridyl 
ligand] is 55 ns in room-temperature fluid 
solution. The redox potential (Eo) associated 
with this compound’s lowest-energy excited 
state was determined to be in the range of 0.6 
to 0.7 V versus saturated calomel electrode (SCE) 
(24). Increasing this potential requires increas- 
ing the energy of the ligand-field excited state, 
which can be achieved by replacing tren(py)s 
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with 2,2’-bipyridine to yield [Fe(bpy)3]°* 
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though this substitution nearly doubles'\—~ 
energy of the photoactive excited state, its life- 
time is decreased to 1 ns under identical con- 
ditions. Further increases in excited-state energy 
will push the lifetime into the subnanosecond 
regime, severely undercutting the photocatalysts’ 
capacity to engage in bimolecular chemistry. 


Leveraging Marcus theory 


The correlation between ligand-field strength 
and excited-state lifetime can be understood 
within the framework of Marcus theory (25-27). 
Although typically invoked in the context of 
electron-transfer reactions, Marcus theory is 
a special case of nonradiative decay theory. 
Its basic principles can be applied to a much 
wider range of physical and photophysical 
phenomena, including excited-state relaxa- 
tion dynamics (28). In the Marcus normal re- 
gion, the zero-point energy of the ligand-field 
excited state—which equates to its “driving force” 
for ground-state recovery (AG°)—is smaller in 
magnitude than the reorganization energy, A, 
associated with converting from that state back 
to the ground state. This leads to a condition in 
which an increase in ligand-field strength results 
in an increase in the rate of excited-state relax- 
ation (i.e., shorter excited-state lifetime) (Fig. 2A, 
left). Photocatalysts designed for energetically 
demanding reactions need higher excited- 
state potentials and, thus, increased ligand-field 
strength. However, the inverse relationship be- 
tween driving force and excited-state lifetime 
means that an increase in ligand-field strength 
also leads to decreased excited-state lifetimes. 

In complexes of Co(IID), the first-row congener 
of Ir(III), the ligand-field strength associated 
with Co(II) is intrinsically larger than that for 
Fe(II). The observed lifetimes of compounds 
such as Co(acac)s and [Co(en)3]** of 2 ps and 
450 ps, respectively (29-31), would therefore 
appear to validate the expectation of decreased 
lifetime with increasing ligand-field strength. 
To date, efforts to develop Co(III)-based photo- 
catalysts have been hindered by the lifetimes 
being too short to effect bimolecular chemistry, 
with limited exceptions (32-37). However, upon 
closer inspection, we realized that a different 
situation may pertain for Co(II). 

The phenomenology of decreasing excited- 
state lifetime with increasing driving force—the 
Marcus normal region—remains valid until the 
driving force and reorganization energy exactly 
offset each other. This point, where |AG®| = A, 
is the barrierless region, where the activation 
energy for the process in question disappears 
and the rate of the reaction is dictated solely by 
the electronic coupling between the two states. 
Marcus’ counterintuitive prediction was that, 
as the driving force is increased beyond this 
point, the barrier is reintroduced (Fig. 2A, right) 
and the rate of the process should start to 
slow down again. This is the inverted region 
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(38, 39). A recent steady-state spectroscopic 
study of a series of Co(II) complexes provided 
quantitative information concerning ligand 
field-state energetics (40). It was determined 
that the ligand-field strength associated with 
[Co(en)3]** is nearly 0.6 V larger than that of 
Co(acac)s, yet the data cited above reveals a 
100-fold longer excited-state lifetime. We there- 
fore hypothesized that the photophysics of 
Co(III) complexes might be occurring in the 
Marcus inverted region (38, 39). 


Photophysics of prospective Co catalysts 


To examine this possibility, we measured the 
excited-state lifetime of [Co(bpy)3]**, a compound 
characterized by an even stronger ligand field 
than ethylenediamine, and observed an order- 
of-magnitude increase in excited-state lifetime 
to 5.0 ns. We then expanded this effort by using 
experimentally determined excited-state energies 
as a guide and measured excited-state lifetimes 
for a series of homologous Co(II) polypyridyl 
complexes (40). The Marcus plot generated from 
these measurements is shown in Fig. 2B. The 
data confirm that the excited-state dynamics of 
this series of compounds are occurring in the 
Marcus inverted region, characterized by a re- 
organization energy of ~4500 cm” (~0.55 eV), 
and exhibit lifetimes that should make these 
complexes viable for bimolecular chemical re- 
actions. Marcus inverted behavior, in which in- 
creasing the ligand-field strength builds in more 
potent excited-state power while simultaneously 
increasing excited-state lifetimes, opens up enor- 
mous possibilities for the use of more Earth- 
abundant compounds in place of precious metal 
catalysts for photoredox catalysis. 

We first sought to evaluate the competency 
of these Earth-abundant cobalt complexes as 
photocatalysts by benchmarking their reac- 
tivity to that of precious metal-based photo- 
catalysts. The excited state of the Co(II) ion 
is expected to be oxidative given that Co(II) is 
more stable than Co(IV) in simple coordina- 
tion complexes. In this regard, a transition to 
Co(II) from Rud) or IrdI]) could potential- 
ly catalyze similar reactions. We selected re- 
ported photoredox transformations that use 
precious metal photocatalysts to enable the 
decarboxylative C-H functionalization of het- 
eroarenes, the C-H phosphonylation of arenes, 
and the o-arylation of ethers (47-43). We dis- 
covered that the performance of our cobalt 
complexes under unoptimized conditions fur- 
nished the desired products in high yields in 
lieu of iridium or ruthenium polypyridyl com- 
plexes as photocatalysts (fig. S15). Furthermore, 
these cobalt photocatalysts are competent in 
metallaphotoredox platforms, as demonstrated 
by replacing iridium with our cobalt complexes 
in the N-alkylation platform achieved through 
the merger of photoredox catalysis and copper- 
mediated C-N bond formation (fig. S15) (44). 
Cobalt thus serves as an Earth-abundant re- 
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Ideal Photocatalyst 


abundant and cheap 


high excited-state energy 


long excited-state lifetime 


How can we simultaneously increase the ligand field energies and lifetimes 
of first-row transition metal chromophores to unlock reactivity? 


A Applying Marcus theory to ligand-field excited state dynamics 
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Fig. 1. Exploiting Marcus inverted kinetics for the development of novel Earth-abundant transition 
metal photocatalysts. (A) Marcus inverted dynamics present the opportunity to simultaneously increase 
excited-state energies and lifetimes. (B) Previously unknown class of Marcus inverted cobalt photocatalysts 


for C(sp*)-N coupling. 


placement for expensive precious metals in 
metallaphotoredox catalysis. More importantly, 
we sought to examine if the oxidizing long-lived 
ligand-field excited state of these cobalt cat- 
alysts could unlock reactivity that is elusive with 
their precious metal counterparts. 


Reaction development 


C(sp”)-N bonds are ubiquitous in pharma- 
ceuticals and bioactive molecules, and robust 
methods to forge such connections are of high 
interest to synthetic chemists (45, 46). Current 
methods to achieve these bond connections 
include palladium-catalyzed Buchwald-Hartwig 
amination, copper-catalyzed Ullmann-Goldberg 
coupling, and copper-catalyzed Chan-Evans-Lam 
coupling (47-49). These methods have rapidly 
achieved widespread adoption in industry. 
The utility of these synthetic transformations 
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stems from the prevalence of nitrogen nucleo- 
philes as readily available building blocks. 
However, these methods can be limited owing 
to the necessity of forcing reaction conditions 
such as high temperatures, strong bases, and 
oxidants, resulting in diminished functional 
group compatibility and poor regioselectivity. 
Furthermore, the scope of these reactions is re- 
stricted to sterically less-hindered coupling part- 
ners because of metal-mediated bond breaking 
and forming processes, including challenging 
oxidative addition and transmetalation elemen- 
tary steps. For example, the key elementary step 
in Chan-Evans-Lam coupling is identified as 
transmetalation, which goes through a four- 
membered transition state akin to a o-bond 
metathesis (50-52). In the case of sterically 
hindered boronic acids, transmetalation is 
disfavored because of the steric encumbrance 
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A Marcus inverted region presents the opportunity to simultaneously increase ligand field energies AG° and lifetimes AG* 
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Fig. 2. Excited-state dynamics of cobalt photocatalysts in the Marcus 
inverted region. (A) Excited-state decay of transition metal complexes as 
described by Marcus theory. (Left) Marcus normal region where the excited-state 
lifetime decreases with increasing driving force. (Right) Marcus inverted region 
where excited-state decay rates slow down and lifetimes increase when the 
driving force exceeds the reorganization energy associated with ground-state 
recovery. (B) (Left) The excited-state lifetimes of the Co(IIl) polypyridine 
complexes systematically increase with increasing driving force for ground-state 
recovery, which is consistent with Marcus inverted region behavior. (Right) 


present in the metallocyclobutane transition 
state. Thus, Chan-Evans-Lam couplings of ortho- 
substituted boronic acid substrates lead to 
diminished efficiency. Therefore, the challenges 
of current C(sp”)-N bond forming methods 
underscore the need to develop distinct meth- 
ods and mechanistic paradigms for the effi- 
cient synthesis of a diverse range of C(sp”)-N 
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inverted region. 


bonds of various substitution patterns, espe- 
cially for sterically demanding cross-coupling 
partners. 

We envisioned that achieving the oxidative 
coupling of boronic acids with nitrogen nucleo- 
philes such as aryl amides would be extremely 
desirable given the ready availability of these 
coupling partners and their compatibility 
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A plot of the experimentally determined excited-state lifetimes as a function 
of the driving force for ground-state recovery. The latter were obtained 

from density functional theory (DFT) calculations of the excited-state energies 
that follow the general trend obtained from experimentally determined 
spectroscopic transitions. The data are well described by Marcus theory, 
revealing a reorganization energy for this isostructural series of chromophores 
of 0.55 eV (4500 cm), significantly smaller than the driving force and 
confirming that the dynamics of this class of compounds occur in the 


with the highly oxidizing Co(III) photocata- 
lyst (53, 54). Direct oxidation of the nitrogen 
nucleophile by our Co(II) photocatalysts could 
furnish an N-centered radical, poised to undergo 
a metal-free zpso-substitution with the boronic 
acid partner to furnish the desired C(sp”)-N 
product (55). The metal-free bond formation 
would allow for the incorporation of more 
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A Design Plan Can cobalt unlock reactivity? 
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Fig. 3. Application of cobalt photocatalysts toward C-N coupling. (A) Proposed catalytic cycle of cobalt photocatalyzed C(sp*)-N coupling of aryl amide and 
aryl boronic acids. (B) Evidence of the intermediacy of an N-centered amidyl radical species. (C) Probing the intermediacy of an aryl radical species. (D) Stern-Volmer 
plot of bimolecular quenching kinetics between [Co(4,4'-Brabpy)3](PF.)3 and acetanilide as determined with time-resolved absorption spectroscopy. The data 
represents the average of two replicate measuremnts and were fit to a simple linear regression. 


sterically encumbered partners. A proposed 
mechanism for the described reaction is shown 
in Fig. 3A (left). Photoexcitation generates a 
highly oxidizing excited-state *Co(III), capable 
of converting the amide partner to the corre- 
sponding amidyl radical. An external oxidant 
oxidizes the photostable Co(II) complex to 
Co(III) species, turning over the cobalt catalyst 
and closing the catalytic cycle. The key C-N 
bond-formation step proceeds by substitu- 
tion of the amidyl radical at the ipso-carbon 
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of the boronic acid, followed by oxidation of 
la by Co(II) or the external oxidant and 
rearomatization of 1b to furnish the desired 
product, 1 (56). 

We explored this idea in the context of the 
model arylation of amides, notorious for their 
high oxidation potentials, and were delighted to 
find that [Co(4,4’-Brabpy)s|(PF¢)3 catalyzes the re- 
action between NV-phenylacetamide [E, = +1.67 V 
versus SCE in MeCN (fig. S16)] and phenyl bo- 
ronic acid in 87% yield (Fig. 3A, right). In 
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fact, the transformation could not be achieved 
in high yields with well-established Ir(IIJ), 
Rud), or organic photocatalysts (Fig. 3A, bottom), 
demonstrating the potential of these Co(III) 
catalysts to unlock distinct reactivity and mech- 
anistic paradigms. 


Mechanistic investigation 


The ground-state electronic absorption spec- 
trum of [Co(4,4’-Br.bpy)3]°* shows a ligand- 
field transition ("A, —'7;) in the visible region 
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Fig. 4. Scope of cobalt photocatalyzed C(sp2)-N cross-coupling of aryl amides and aryl boronic acids. All yields are isolated (detailed reaction conditions 


provided in supplementary materials). 


at 461 nm (fig. S6 and table S2). This confirms 
that the Co(III) complex absorbs visible light 
to access a ‘7, ligand-field excited state, fol- 
lowed by intersystem crossing (ISC) to its lowest- 
energy ligand-field excited state, which serves 


Chan et al., Science 382, 191-197 (2023) 


as the photoactive species. We carried out a series 
of measurements following the approach of 
Meyer and co-workers (57), which are designed 
to bracket the excited-state oxidation potential 
of [Co(4,4'-Brobpy)3 (PF ¢)3, which was determined. 
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to be Ey *[CoCID/Co(I1)] ~ +1.65 V versus SCE in 
MeCN. This cobalt complex is more oxidizing 
than the highly oxidizing iridium photocatalyst 
[Ir(dF(CF3)-ppy)o(dtbbpy)](PF¢) (Eo *[Ird)/ 
Ir(II)] = +1.21 V versus SCE in MeCN) (58). 


5 of 7 


RESEARCH | RESEARCH ARTICLE 


Upon optimization, we were able to simplify 
the procedure by complexing and oxidizing 
the commercially available Co(acac), and 4,4’- 
Br.bpy ligand in situ to form the active catalyst 
before irradiation (table S3). A variety of oxi- 
dants (table S4) could affect the desired chem- 
istry, and persulfate was found to provide the 
highest yields. Control experiments revealed the 
necessity of the cobalt salt, bipyridine ligand, 
and oxidant in the desired transformation; C-N 
bond formation could not be achieved ther- 
mally (table S6). Overall, these findings are con- 
sistent with a cobalt-photocatalyzed reaction 
proceeding by means of a radical pathway. 

When amide substrate 2 (E, = +1.70 V ver- 
sus SCE in MeCN; fig. S17) (E,, peak potential) 
was subjected to the standard reaction condi- 
tions, cyclized product 3 was observed in 15% 
yield, and C(sp”)-N coupling product was ob- 
served in 5% yield (Fig. 3B and fig. S18). In the 
absence of the boronic acid coupling partner, 
cyclized product 3 is produced exclusively in 
15% yield. Formation of 3 is proposed to proceed 
through 5-exo cyclization of the amidyl radical 
2a followed by C-S bond homolysis of 2b to 
regenerate the double bond (59, 60). No reaction 
was observed in the absence of cobalt (table S8), 
supporting its role in the oxidation of the amide 
and generation of the N-centered amidyl rad- 
ical intermediate. Furthermore, Stern-Volmer 
bimolecular quenching studies performed with 
time-resolved absorption spectroscopy indicate 
that the amide partner quenches the excited 
state of the Co(III) photocatalyst (Fig. 3D and 
table S9). The observation of photostable re- 
duced Co(II) photocatalyst in the reaction in situ 
by means of paramagnetic photo-nuclear mag- 
netic resonance (photo-NMR) confirms the 
reduction of highly oxidizing Co(III) photo- 
catalyst (fig. S20). Upon subjecting an analo- 
gous boronic acid with a pendant olefin 4 to 
the reaction condition, no cyclized product 5 
was observed (67-63), and only the C(sp*)-N 
coupled product was observed in 18% yield 
(Fig. 3C and fig. S19). These results suggest 
that amidyl radical formation is an operative 
and productive pathway, whereas oxidative 
aryl radical generation is unlikely. 


Substrate scope exploration 


Lastly, we examined the generality of our dis- 
covered protocol. Elucidation of the scope of 
this transformation demonstrated that a range 
of steric and electronic substitutions on the aryl 
coupling partners are compatible in this reac- 
tion (Fig. 4). Electron-deficient N-aryl amides 
bearing trifluoromethyl, cyano, carboxy, and 
sulfonazido groups performed admirably (6 to 
9, 80 to 91% yield). Moreover, substrates bear- 
ing halogen atoms, which can engage in subse- 
quent coupling platforms, reacted to generate 
the diaryl amide product cleanly under the op- 
timized conditions (10 to 13, 90 to 98% yield). 
Electron-rich acetanilides with functionality at 
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the ortho- and para- positions also performed 
well and did not suffer overoxidation (14 and 
15, 98% yield). Furthermore, placement of an 
electron-donating or -withdrawing group at 
the meta-position did not reduce efficiency (16 
and 17, 91 and 86% yields, respectively). 

The aryl amide scope can be extended to 
cyclic and acyclic amides with varying substi- 
tution patterns. Formanilide and simple alkyl 
substituted amides all delivered the desired cou- 
pled products in excellent yield (18 to 20, 73 to 
85% yield). Notably, substrates containing satu- 
rated (hetero)cyclic fragments, motifs ubiquitous 
in medicinal agents, furnished the products in 
good yields (21 to 23, 68 to 85% yield). In addi- 
tion, benzocaprolactam and phthalazine-l-one 
served as competent coupling partners (24: and 
25, 92 and 89% yield, respectively), with the lat- 
ter suggesting opportunities for selective cou- 
pling of other oxidizable nitrogen nucleophiles. 

A diverse array of electron-rich aryl boronic 
acids performed well in this transformation (26 
and 27, 97 and 93% yield, respectively). Notably, 
the efficient formation of aldehyde-containing 
28 (92% yield) highlights the functional-group 
compatibility of the transformation because 
aldehydes can easily be overoxidized to the cor- 
responding carboxylic acids in the presence 
of metal and potassium persulfate. Boronic acids 
bearing electron-deficient functionalities, which 
are challenging in traditional Chan-Evans-Lam 
couplings, such as ketone and ester groups, also 
performed well in the reaction (29 and 30, 93 
and 97% yield, respectively). Also of note is that 
an N-alkyl amide was untouched by the reac- 
tion (31, 80% yield), presumably because of the 
higher oxidation potential of N-alkyl amides 
compared with that of N-aryl amides. As such, 
the reaction occurred with high chemoselec- 
tivity at the N-aryl amide. In addition to para- 
substituted boronic acids, aryl nucleophiles 
with meta-substitutions can be coupled under 
the optimal conditions (32, 98% yield). 

Most notably, this method is efficient in cou- 
pling sterically encumbered aryl boronic acids. 
A general issue of Chan-Evans-Lam reactions 
is ortho-substitutions on the aryl organoboron 
partner, in which increasing steric bulk around 
the borylated position renders the reaction less 
effective and often unsuccessful (64, 65). In 
our developed method, boronic acids bearing 
chloro, bromo, and phenyl substituents at the 
ortho-position were well accommodated (33 
to 35, 86 to 97% yields). Extended 1-systems, 
such as naphthyl groups, were also successfully 
amidated (36, 92% yield). The observed high 
chemoselectivity bodes well for the applica- 
tion of this chemistry in linchpin strategies 
for the differential functionalization of arene 
cores. Furthermore, expected steric limita- 
tions of Chan-Evans-Lam couplings include 
2,6-dimethylphenylboronic acid, which can 
render the reaction completely ineffective (66). 
However, this di-ortho aryl boronic acid was 
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successfully employed in our coupling (37, 
91% yield). The high tolerance of our trans- 
formation toward sterically hindered coupling 
partners (e.g., ortho-substituted aryl boronic 
acids) supports the proposal that the bond- 
forming step is not metal-mediated. 


Outlook 


By increasing both the ligand-field energy and 
electronics, ground-state recovery of cobalt(III) 
in the Marcus inverted region can be leveraged 
as a design principal for other first-row metal- 
based photocatalysts. This would open up 
enormous possibilities for the use of Earth- 
abundant compounds for photoredox cataly- 
sis, helping to provide a sustainable future by 
unlocking previously unknown mechanisms 
and transformations. 
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Atomic faulting induced exceptional cryogenic strain 
hardening in gradient cell-structured alloy 


Qingsong Pan’, Muxin Yang”, Rui Feng®, Andrew Chihpin Chuang“, Ke An, Peter K. Liaw”, 
Xiaolei Wu’, Nairong Tao’, Lei Lu’* 


Coarse-grained materials are widely accepted to display the highest strain hardening and the best 
tensile ductility. We experimentally report an attractive strain hardening rate throughout the 
deformation stage at 77 kelvin in a stable single-phase alloy with gradient dislocation cells that even 
surpasses its coarse-grained counterparts. Contrary to conventional understanding, the exceptional 
strain hardening arises from a distinctive dynamic structural refinement mechanism facilitated by the 
emission and motion of massive multiorientational tiny stacking faults (planar defects), which are 
fundamentally distinct from the traditional linear dislocation—mediated deformation. The dominance of 
atomic-scale planar deformation faulting in plastic deformation introduces a different approach for 
strengthening and hardening metallic materials, offering promising properties and potential applications. 


train hardening, also known as work hard- 

ening, which dates back to the Bronze 

Age, is the earliest widely used strategy to 

strengthen metallic materials (7). Tradi- 

tionally, strain hardening results from a 
marked increase in the number of typical linear 
defects—i.e., dislocations—and their mutual in- 
teractions in crystalline lattice, which in turn 
tends to gradually reduce dislocation mobility 
(2-6). As a result, larger stresses are necessarily 
applied so that additional deformation may 
take place (, 5, 6). 

Generally, the soft coarse-grained (CG) 
metals display the highest strain hardening as 
well as the best tensile ductility because of the 
abundant space and the largest free path for 
movement and storage of dislocations (4, 7). 
Dislocation-dislocation interactions in the 
crystalline solid give rise to local dislocation 
tangles and eventually a three-dimensional 
(3D) network of dislocation patterns (i.e., cell 
walls), where further deformation becomes 
difficult (3, 4). In particular, the inherently 
thermal-assisted dislocation recovery and an- 
nihilation process gradually takes over to re- 
sult in a gradually saturated substructural size 
at micrometer or submicrometer scale and a 
common drop in hardening rate with increas- 
ing strain (J, 4, 7). 

The reduction in strain hardening becomes 
more pronounced in high-strength materials 
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(8-10). Traditional strengthening methodologies— 
where strength is achieved either by chang- 
ing composition or modifying hierarchical 
microstructures—are invariably built on the 
fundamental principle of blocking dislocation 
motion through incorporating various defects 
in acrystalline lattice, but they inevitably dete- 
riorate strain hardening capacity (/, 10-16). For 
example, nanostructured metallic materials 
containing massive grain boundaries (GBs) 
and heavily deformed materials containing a 
high density of dislocations have notably ele- 
vated strength yet exhibit markedly reduced 
strain hardening and limited uniform ductility 
down to a few percent (8, 9). Strain hardening 
is essential because it effectively delocalizes flow 
strain, enhances tensile ductility, and inhibits 
catastrophic mechanical failure (/, 4, 6, 17). How- 
ever, substantial improvement in the strain 
hardening of high-strength metallic materials 
has been one of the thorniest problems of ma- 
terials science over the past century. 
Decreasing the deformation temperature 
substantially elevates strain hardening, essen- 
tially arising from the enhanced dislocation 
activity, such as reduced dynamic recovery 
and/or annihilation and the corresponding 
increased dislocation storage rate for various 
materials (1, 4, 7, 11). When strain hardening 
is improved in this manner, deformation twin- 
ning and phase transformation may act as 
secondary carriers of crystal plasticity at low 
temperature (17-20). This scenario is particu- 
larly important for traditional single-principal 
element alloys with low stacking fault energies 
(SFEs) (, 11, 27) and more recently developed 
high- to medium-entropy or multi-principal ele- 
ment (MPE) alloy systems (22-30). The pres- 
ence of dynamic interactions between newly 
formed interfaces and dislocations contrib- 
utes to a reasonable strain hardening capac- 
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dislocations are still acknowledged as the lo 
ing actor and dominate the plastic deformation 
of all metallic materials at cryogenic tempera- 
ture C1, 4, 11, 31). 

Previous studies have shown that engineering 
spatially heterogeneous nanostructures con- 
taining gradient, bimodal grain sizes, multi- 
phases, etc. could exhibit extra strain hardening 
compared with their homogeneous components 
because large plastic strain incompatibility 
induced an additional activity of geometry- 
necessary dislocations (14-16, 32). However, the 
observed hardening as a result of strain gra- 
dients tends to be maintained only at the small 
plastic strain stage (<5%) and decreases at large 
strains, even at low temperature (J4, 15, 29, 32). 

We recently discovered that cyclic torsion 
(CT)-induced ultrafine-scaled gradient dis- 
location cell structures (GDSs) in a single-phase 
Alg;CoCrFeNi MPE alloy activate massive par- ° 
allel stacking faults (SFs) at ambient tempera- 
ture, which are responsible for a high strength 
and considerable tensile plasticity (33). In- 
spired by this, we explored whether this GDS 
can effectively trigger SFs at low temperature 
to improve the strain hardening even at high ,. 
strength. 

Instead of developing parallel SFs, we ob- 
served extensive proliferation of multidirec- 
tional, extremely fine planar SFs starting from 
initial cell walls that induced a progressive 
structure refinement into tiny mosaic hier- 
archy. Along with the formation of traditional 
dislocations, the mosaic structure contributed 
to an exceptional strain hardening that even 
surpassed its CG counterparts at cryogenic 
temperature. 


Gradient dislocation structure 


The CG Alp ;CoCrFeNi MPE alloy samples ini- 
tially had randomly oriented grains with an av- . 
erage size of ~46 um (fig. S1). We processed the 
dog bone-shaped bar MPE specimens (Fig. 1A) 
using CT treatment at ambient temperature to 
obtain a sample-level hierarchical GDS structure 
in the gauge section (4.5 mm in gauge diameter, 
12 mm in gauge length) (Fig. 1B) [see (33) for 
processing details]. Both the grain size and 
morphology of the initial random-orientated 
grain structures were unchanged, having a ho- 
mogeneous distribution from the surface to 
the core of the GDS sample (Fig. 1D). In partic- 
ular, profuse low-angle boundaries, defined as 
those with a misorientation of <15°5 were intro- 
duced in the grain interior and spatially distrib- 
uted. The low-angle boundaries become lower 
in density with increasing depth from the top 
surface (Fig. LE). We used transmission elec- 
tron microscopy (TEM) observations to verify 
that abundant equiaxed dislocation cells, with a 
misorientation of ~2° on average, were perva- 
sively formed in the topmost grains (~100 um 
from the surface) (Fig. IF). Each cell wall (~40-nm 
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thickness on average, corresponding to low- 
angle boundaries) had a high density of dis- 
locations, which we verified using selected area 
diffraction patterning (Fig. IF, inset). This fea- 
ture separates the wall from the cell interior, 
which had low-density dislocations. We mea- 
sured the cell diameter to be ~200 nm in the 
top surface that gradually increases to ~350 nm 
by 0.45-mm depth (Fig. 1, F and G). 

By contrast, individual dislocations and re- 
lated loose dislocation tangles are prevalently 
detected in grains at the core (Fig. 1H). We did 
not identify SFs or deformation twins, except 
for a few micrometer-spaced annealing twins 
(Fig. 1, E to H). The GDS sample is still in a state 
of a stable, single face-centered cubic (fcc) phase, 
as we show with our electron backscatter dif- 
fraction (EBSD) and TEM results (Fig. 1, D to 
H). This sample-level gradient dislocation ar- 
chitecture results in a distinct gradient dis- 
tribution of microhardness (fig. S2)—gradually 
decreasing from 3.7 GPa at the topmost sur- 
face to 3.1 GPa at a 0.45-mm depth and 2.2 GPa 


A 


in the central region of the as-prepared GDS 
sample, much larger than that of the dislocation- 
free CG (~1.7 GPa). 


Strain hardening ability at 77 K 


Both engineering (Fig. 2A) and true (Fig. 2B) 
stress-strain curves of quasistatic uniaxial ten- 
sile tests show that GDS samples have markedly 
elevated strength and ductility when reducing 
the temperature from 293 K to 77 K, similar to 
those of single-phase fcc metals and MPE alloy 
systems reported in the literature (1, 24, 26-28). 
Impressively, true yield strength (o,, at 0.2% 
offset) of the bulk GDS sample at 77 K is ~0.7 GPa, 
and its true ultimate tensile strength (oyrys) is 
more than 1.8 GPa, which is much higher than 
that of CG samples. In addition, a large strain 
hardening coefficient (i.e., true uniform elon- 
gation) is detected for the GDS alloy, ~0.48, 
slightly reduced relative to its CG counterpart 
(0.55) at 77 K (Fig. 2B). 

Because the high density of dislocation cells 
is spatially concentrated in the surface layer of 


C 


Fig. 1. Typical microstructure of GDS alloy. (A) The image of the dog bone-shaped GDS bar sample, 

of which the gauge section was processed by means of CT processing. (B) Cross-sectional schematic 
showing a sample-level GDS structure from surface (dark blue) to core (light gray). (C) 3D x-ray tomographic 
reconstruction of the gauge section of a GDS tubular sample (~0.45-mm thickness). (D and E) Cross- 
sectional EBSD images of the GDS alloy showing the distributions of grain-scaled morphology, orientation 
(D), and three types of boundaries defined with different misorientation angles (E) within a depth of 1.0 mm 
from the surface. HAGB, high-angle grain boundary; LAB, low-angle boundary. (F to H) The bright-field 
TEM images [(F) and (G)] at the positions of the white squares in (D) and that at the core (H), showing a 
hierarchical GDS structure. The insets in (F) to (H) are the corresponding selected area electron 


diffraction (SAED) patterns. 
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GDS samples, we removed the inner 3.6-mm- 
diameter core of GDS samples (Fig. 1C) to 
highlight the mechanical behavior of the high- 
density dislocation cell structure itself. Then, 
taken as an example, GDS tubular specimens 
with a sole surface architecture (~0.45-mm 
thickness) (hereafter referred to as the surface 
GDS) show a further enhanced strength at 
77 K—Le., true oy and oyrs up to ~0.9 GPa and 
2.4 GPa, respectively, and a considerable true 
uniform elongation as large as ~50%, compa- 
rable to that of the bulk GDS and CG counter- 
part (Fig. 2B). 

Analysis of tensile test results for the CG yields 
a monotonically decreased strain hardening 
rate (©) upon straining more than 1% at both 
293 K and 77 K (Fig. 2C). Obviously, the strain 
hardening rates of both the GDS and surface 
GDS samples at 293 K are still lower than that 
of CG sample owing to the presence of numer- 
ous preexisting dislocations, consistent with 
the literature data (9, 14, 27, 28). By contrast, 
we observed an unexpectedly high © in the 
GDS at 77 K, even higher than that of its CG 
counterpart, throughout the entire plastic de- 
formation stage. Notably, the strain harden- 
ing rate of the surface GDS becomes further 
elevated—i.e., with © gradually decreased 
from 4.2 GPa at a 3% strain to 2.4 GPa before 
necking—compared with CG (from 3.1 to 1.6 GPa) 
and GDS specimens (Fig. 2C). 

We observed the same trend in the strain 
hardening rate normalized by flow stress (O/o) 
versus true stress (Fig. 2D), showing a higher 
Q/o in GDS and the highest ©O/o in surface GDS 
compared with that of CG at 77 K. The nota- 
bly superior strain hardening behavior agrees 
well with the remarkable microhardness in- 
crement at the topmost GDS layer from the 
initial 3.7 GPa to 5.0 GPa after a 40% strain— 
the latter is much higher than that for the 
GDS core and CG counterparts (~4.1 GPa) at 
77 K (fig. $2). Such ultrahigh strain hardening 
rate and ductility in GDS samples are rarely 
seen in conventional high-strength metallic 
materials and MPE alloys with dislocation- 
dominated strain hardening (J, 1/1, 28), and 
they indicate the presence of an unusual strain 
hardening mechanism inherent to gradient 
dislocation structures upon straining at 77 K. 

To demonstrate the excellent strength-strain 
hardening synergy of GDS samples at 77 K, we 
compare their uniform elongation versus ulti- 
mate tensile strength (Fig. 2E) with those of 
the same compositions with homogeneous 
and gradient-grained structures at 293 K and 
77 K (34-36) and with other MPE alloy systems 
(37-42) as well as the advanced cryogenic steels 
(21, 43, 44) at 4.2 K to 77 K, which correspond to 
various strengthening strategies. We normal- 
ized the strength by the Young’s modulus of the 
material. A superb ductile and strong GDS at 
77 K stands out from the general mechanical 
trade-off performance of other materials—i.e., 


2 of 6 


RESEARCH ARTICLE 


RESEARCH | 


wee eee el 


. 


seme et Eo me wee 


oor ~ a ne 


Engineering stress (GPa) 


293K 77K 

-e- —e— Surface GDS 
-a- —# GDS 

-A- —*# CG 


0 20 40 60 
Engineering strain (%) 


80 


True stress (GPa) 


0 20 
True strain (%) 


40 


Fig. 2. Strain hardening and strength-ductility combination of GDS alloy 

at 77 K. (A and B) Tensile engineering (A) and true (B) stress-strain curves of 
GDS, surface GDS, and CG samples at 293 K and 77 K. (C and D) The 
corresponding strain hardening rate and true strain relations (C) and normalized 
strain hardening rate by flow stress and true stress relations (D). (E) Strength 
and ductility synergy of GDS alloy, compared with other high-performing 
materials at cryogenic temperature. Uniform elongation versus ultimate tensile 
strength normalized by Young's modulus of the GDS alloy, compared with those 


the overall strength gain usually comes at the 
expense of a severe ductility loss (Fig. 2E). 


Dynamic stacking faulted (SFed) mosaics 
induced strain hardening mechanism 


Focusing on deciphering the underlying un- 
usual strain hardening mechanism of GDS at 
cryogenic temperature, we further character- 
ized the microstructures at the top surface of 
GDS by interrupting tensile tests at the early 
stage (3%) (Fig. 3, A to C, and fig. S3) and the 
later stage of plasticity (40%) (Fig. 3, D to G, 
and fig. S4) at 77 K. At the strain of 3%, the 
dislocation cells remain almost unchanged in 
shape and size. Specifically, plentiful events of 
different orientated lamellar bundles inter- 
secting with each other were identified inside 
the grains, independent of grain orientations 
(denoted by dashed lines in Fig. 3, A and B, 
and fig. S3), which was rarely detected in the 
surface of the GDS tensile deformed at 293 K 
(33). Atomic-resolution, aberration-corrected 
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high-angle annular dark-field scanning trans- 
mission electron microscopy (HAADF-STEM) 
views further revealed that these interlocked 
lamellae are multiple SF bundles and twin 
boundary (TB) segments in different (111) slip 
planes, with an average value of 22 nm in length 
(Fig. 3C). The statistic of the thickness between 
neighboring SFs or TBs is only 2.8 nm on aver- 
age, corresponding to a high density of defects 
(~1.3 x 10"” m””) in the lamellar bundles. 

At tensile strain up to 40%, the mutually in- 
tersected planar interfaces notably proliferate, 
further subdividing the microsized topmost 
surface grains into nanometer-sized mosaics 
(Fig. 3, D to F). The initial equiaxed dislocation 
cells and walls in these grains have disappeared; 
instead, abundant mosaic-shaped substruc- 
tures prevail in the grain interior (Fig. 3F), with 
a mean size of ~50 nm. Our HAADF images 
identify more details indicating that these 
mosaics are further refined by individual tiny 
mosaics, which contain extremely fine SFs and 
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of the same compositions (with homogeneous, gradient-grained structures) at 
293 K and 77 K (34-36), other MPE alloys (37-42), and cryogenic steels 

(21, 43, 44) tensioned from 4.2 K to 77 K in the literature. The hollow five-pointed 
star denotes the property of the topmost GDS surface with the use of the 
strength value estimated from the Hv/3 and uniform elongation data from tensile 
test of GDS layer. The error bars represent standard deviations from more 
than three independent tensile tests. DC, dislocation cell; UFG, ultrafine grain; 
HGS, heterogeneous grain structure; FG, fine grain; NG, nanograin. 


twins and are separated primarily by low-angle 
misorientation (<15°) (Fig. 3G). We named 
these structures tiny SFed mosaics. The spacing 
between adjacent SFs or TBs in tiny SFed mo- 
saics is ~1.8 nm, whereas their length is ~20 nm 
on average. 

The presence of ultrafine SFs and TBs at 77 K 
indicates an extremely high density of planar 
interfacial defects, ~3.1 x 10’ m~’, which is 
about six times as high as that (~5.2 x 10°° m~) 
in GDS deformed at 293 K (33). By contrast, in- 
tensive dislocations along primarily slip planes, 
with a small quantity of parallel SFs and de- 
formation twins, dominate the plastic strain 
of CG at 77 K (fig. S5) and the core of bulk 
GDS samples at 77 K, similar to most single fcc 
MPE alloys with low SFEs at low temperature 
(27, 28, 45, 46). 

Quantitative composition analysis at the 
nanometer scale by means of energy-dispersive 
x-ray spectroscopy mapping displays a homo- 
geneous distribution for each element without 
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Fig. 3. Microstructural evolution of the GDS surface layer during tension 
at 77 K. (A and B) The cross-sectional SEM (A) and bright-field TEM (B) images 
of the GDS surface layer at 3% tensile strain, showing the widespread occurrence 
of two sets of SF bundles (indicated by the white arrows) at different {111} slip 
planes and across multiple dislocation cell patterns in the topmost surface. The 
white line with double arrows in (A) denotes the loading axis (LA). The corresponding 
SAED patterns in (B) contain two sets of parallel streaks from SFs (along two 

[111] directions, noted by the white arrows). (C) An aberration-corrected HAADF- 
STEM image taken from one SF bundle, revealing an ultrahigh density of SFs 

and TBs. (D and E) The cross-sectional SEM images of the GDS surface layer at 


any detectable compositional segregation in 
SFed mosaics (fig. S4). This is analogous to that 
in the as-prepared state (33), possibly owing 
to the suppressed atomic diffusion of elements 
when performing the tensile tests in liquid 
nitrogen. 

Synchrotron x-ray diffraction (SXRD) scans 
from the topmost surface to the core of both 
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GDS and CG samples after tension showed 
higher SF and twin probabilities at 77 K com- 
pared with those at 293 K at the same strains 
(Fig. 4A and fig. S6), which verifies the readier 
formation of SFs and twins at 77 K, as previous- 
ly reported in other studies (24, 26, 39, 45-47). 
Notably, we obtained a higher value of the SFs 
and twins probability in the surface GDS layer 
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40% tensile strain, showing an example of one grain containing denser mutually 
inclined planar interfaces (denoted as white dashed lines). (F) HAADF-STEM image 
showing massive nanomosaics in the one grain interior. The inset at the bottom left 
is the corresponding SAED pattern. (G) Atomic-resolution HAADF-STEM image 
showing a typical example of several tiny mosaics of ~10-nm scale in size (sketched 
as red dashed curves) containing atomic-scaled SFs at different {111} slip planes. The 
numbers in red denote the misorientation angles of the interfaces separating the 
adjacent SFed mosaics. (H) Schematic illustration showing the dynamic structural 
refinement process of initial dislocation cell structure enabled by the formation 

of extensive nanomosaics during the tensile experiment at 77 K. 


at 3% at 77 K, even higher than those of CG 
tension loaded at a strain of 40% at 293 K. 
With increasing the tension strain (40%) at 
77 K, the further-elevated SF and twin prob- 
ability is evidently detected in the surface GDS 
layer compared with CG and the core of GDS 
sample, which is fully consistent with our ex 
situ SEM and TEM observations (Fig. 3). Such 
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Fig. 4. The evolution of SF probability and dislocation density in the GDS and CG alloys after tension 
at 77 K. (A and B) The evolution of SF probability (A) and full dislocation density (B) from the topmost 
surface to the core of GDS and CG (tensile strained 3% and 40%) samples at 293 K and 77 K, determined by 
the extended convolutional multiple whole profile (eCMWP) fitting analysis on SXRD spectrums. SF 
probabilities in the 3% deformed GDS sample at 293 K were too low to be extracted. 


an elevated SF and twin probability is also 
expected in the surface GDS samples because 
they have the same GDS structure (i.e., size 
and distribution of dislocation cells) as bulk 
GDS samples. 

The extended convolutional multiple whole- 
SXRD profile analysis also reveals a much higher 
density of dislocation density in the GDS sam- 
ples at 77 K (up to 2.0 x 10° m ? at a 40% tensile 
strain) compared with those in the GDS at 293 K 
and the CG counterparts at 77 K (Fig. 4B). In 
addition, the SXRD profiles from the topmost 
surface to the core of GDS after tensile defor- 
mation still display a stable single fcc phase 
without detectable peaks from any hcp phases 
(fig. S6), also resembling the microstructure 
displayed in Fig. 3. 

Traditional strengthening mechanisms in- 
variably confront a thorny dilemma: They 
strongly resist dislocations yet greatly reduce 
their accumulation density in fine structures 
(diminished strain hardening), even at cryogenic 
temperature (11, 13, 28, 29), as we also explicitly 
demonstrated in Fig. 2E. By contrast, on the 
basis of our results, we rationalize that through 
engineering metastable ultrafine dislocation 
cells, a mechanism of massive multiorientational 
tiny planar SFs mediated dynamic structural 
refinement, effectively blocks and accumulates 
dislocations simultaneously, and is responsi- 
ble for an exceptionally high strain hardening 
rate at 77 K. 

Linear dislocations are the primary and 
easiest elementary strain carrier in crystal- 
line materials (/, 17). The strain hardening of 
polycrystalline metals is essentially attributed 
to the motion, mutual obstruction, and ac- 
cumulation of dislocations on multiple slip 
systems and their progressive self-organization 
into saturated dislocation patterns aided by 
intrinsic high-dynamic recovery of dislocations 
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(4, 11, 48). Without exception, intensive multi- 
ple dislocations glide and interact, subsequently 
resulting in the formation of submicrometer- 
sized dislocation cells and walls, having a very 
high density of dislocations, as we observed 
in the as-prepared GDS sample during the CT 
process (Fig. 1 and Fig. 3H). However, contradic- 
tory to the conventional wisdom of less potential 
for further storing dislocation in high-strength 
metals with high density of preexisting inter- 
faces or dislocations (1, 10, 11), atomic-scale 
multiple-slip SFed mosaics, besides high-density 
dislocations, are robustly triggered in highly 
dislocated GDS after tension at 77 K. Previous 
studies have reported that the introduction 
of nanoprecipitates in MPE alloys facilitates 
multiple SF slips at 77 K (41, 49). However, this 
trend only occurs in a very locally large stress- 
strain regime, mostly adjacent to the nanopre- 
cipitates with a limited volume fraction. 

The nucleation of profuse multiple SFs ini- 
tially arises from the emission process from 
numerous cell walls of GDS, which contain an 
ultrahigh dislocation density—i.e., acting as 
the abundant preexisting intragrain sources 
owing to the intrinsic locally varied, low-SFE 
feature of the MPE alloy containing saliently 
rugged local concentration fluctuations [6 to 
21 mJ/m? at room temperature (50)]. In par- 
ticular, similar to most fec MPE alloys at low 
temperature (26, 28, 39, 45-47), the SFE of 
material is monotonically reduced at 77 K, 
which further increases the tendency of ac- 
tivation and expansion of multiple SFs. In ad- 
dition, because it is structural size dependent 
(51, 52), deformation planar faulting, compared 
with the classic linear dislocation activity, is 
more favorable in ultrafine-sized dislocation 
cells at 77 K. The intrinsic reason for the for- 
mation of SFs has been discussed in detail by 
Pan et al. (33). Under such optimal thermo- 
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dynamical circumstances, the ready, multiple- 
glide feature of abundant mobile SFs makes 
them the potentially dominant carrier to me- 
diate plastic strains, analogous to the tradi- 
tional dislocations. 

Upon tensile strain, the built-in progressive 
gradient plastic deformation feature in GDS 
substantially enables strain delocalization and 
allows for the dominance of abundant multi- 
ple SF activation to mediate both statistically 
stored and geometrically necessary plastic strain 
in conjunction with dislocations (14-16, 53, 54). 
A strong locally complex multiaxial stress field 
coupled with gradient straining and strain path 
effect from torsion to tension may be built in at 
77 K, providing a markedly high internal driv- 
ing force for the activation and gliding of more 
multiple SFs (4, 55), as reasonably evidenced 
by the ultrahigh microhardness in surface GDS 
and orientation-independent character of the 
numerous topmost grains containing SFs at 77 K 
(figs. S2 and S3). 

A SF has a salient 2D stable planar feature, 
different from the flexible line defect of dislo- 
cations (3). As such, with increasing imposed 
stress, a greater number of multiple 2D SF in- 
terfaces readily propagate from cell walls and 
interact with each other at various slip planes, 
progressively subdividing the initial disloca- 
tion walls into parallel SF bundles (Fig. 3, A to 
C and H). Subsequently, the newly formed 
massive SFs and their intensive interactions 
with cell walls, and/or individual dislocation 
also act as additional, dynamically formed, and 
sustainable sources that accelerate the storage 
rate of planar SFs and linear dislocations therein 
via long-range stress (fig. S7). Besides SFs, we 
also detected more planar TBs at higher tensile 
strain, yet far fewer than the number of SFs (Fig. 
3), primarily resulting from the by-product of the 
continuous SFs glide with the same sign on con- 
secutive {111} atomic planes from cell walls or 
their interactions with dislocations (/, 17, 56). 

Specially, the 2D SF interface tends to be- 
come more stable at 77 K, corresponding to a 
less-thermally assisted annihilation process 
during tensile straining compared with lin- 


ear dislocation. Taken together, as aresultof . 


superhigh-density SF and TB accumulation, 
continuous net SF and TB pump naturally 
causes progressive in situ structural refine- 
ment from submicrometer-sized cell walls into 
more tiny SFed mosaics, containing extremely 
fine SFs and TBs as well as a high density of 
dislocations (Fig. 3, D to H, and Fig. 4B). 

The prominent structural feature of SFed 
mosaics is that they contain a high density of 
multiple, short, yet thin SFs and TBs. These 
come from the synergistic combination effects 
of the GDS structure and the rugged local con- 
centration fluctuation environments of MPE 
alloys that occur intrinsically as well as the cry- 
ogenic temperature condition. Due to the lowest 
energy of these basic deformation structural 
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units of SFs and TBs, the deformation-induced 
transition from metastable dislocation cells 
to further refined tiny SFed mosaics is thus 
considered to be an autonomous structural 
evolution process close to the minimum free 
energy, strictly obeying the classic strain harden- 
ing principles of metals—i.e., self-organizations 
and low-energy structures (4, 48). 

As demonstrated by experiments and mo- 
lecular dynamic simulations (56, 57), nanometer- 
spaced SF interfaces are able to hinder dislocation 
motion as effectively as nanoscale TBs and GBs. 
Therefore, the dynamic structural refinement 
process from the initial cells to finer SFed mo- 
saics built-in grain interiors plays dual roles— 
not only effectively accumulating dislocations 
but also hindering them simultaneously, great- 
ly increasing slip resistance. This process, to- 
gether with intensive dislocation-dislocation 
interaction-induced pronounced forest strength- 
ening, dynamically hardens the GDS surface 
layer much faster and substantially higher than 
the bulk GDS and CG at 77 K throughout the 
tensile test (Fig. 2B). 

By conducting the tensile load-unload-reload 
tests U1, 14, 15, 37), as shown in fig. S7, we ob- 
served that both back stress and effective stress, 
partitioning from the true flow stress, mono- 
tonically increase with increasing tensile strain 
for GDS at 77 K and are higher than that of the 
CG counterpart through the entire uniform 
straining range. In particular, the increase in 
the internal back stresses (up to 1.4 GPa before 
necking) is the decisive contributor to the flow 
stress of bulk GDS after yielding, accounting 
for ~70% of the total flow stress during strain 
hardening. The enhanced long-range back 
stress hardening arises from initial ultrafine 
dislocation cells and later dynamic genera- 
tion of tiny mosaics, whereas the high short- 
range effective stress originates from forest 
dislocation hardening of higher-density dis- 
locations. Coupled high back stress and effec- 
tive stress are collectively responsible for the 
excellent strain hardening of GDS throughout 
tensile straining at cryogenic temperature (Fig. 
2, C and D). As such, the sustainable, high-strain 
hardening rate consequently enables GDS 
samples a high ductility (or strain hardening 
coefficient) at gigapascal flow stresses after 
yielding at ~0.9 GPa (Fig. 2B). 

Our experimental observations point to an 
unusual strain hardening mechanism that is 
readily triggered by the formation of extreme- 
ly refined SFed mosaics in a single fcc phase 
MPE alloy with gradient dislocation structure 
at cryogenic temperature, giving it an unprec- 
edentedly high strain hardening capacity, even 
beyond that of its CG counterpart. This dy- 
namic SFed mosaics-induced strain hardening 
mechanism at cryogenic temperature echoes 
our earlier results of SF-induced plasticity as 
well as the exceptional strength and ductility 
in the GDS alloy at room temperature (33, 58). 
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Evidently, the underlying dominant atomic- 
scale planar deformation faulting activities in 
crystalline lattice not only play an alternative, 
elementary carrier of crystal plasticity but also 
induce robust strain hardening compared with 
the linear dislocations. The feature of gradient 
dislocation architectures and nanosized SFs 
mosaics is of great importance for understand- 
ing the fundamental strain hardening mecha- 
nism from physical metallurgy and provides a 
different paradigm for developing strong and 
ductile materials, especially for a wide spec- 
trum of cryogenic applications, such as deep 
space and ocean exploration, liquefied natural 
gas storage, cryogenic physics, and so on. 
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Direct observation of chirality-induced spin 
selectivity in electron donor-acceptor molecules 


Hannah J. Eckvahl'+, Nikolai A. Tcyrulnikov'+, Alessandro Chiesa*+, Jillian M. Bradley’, 
Ryan M. Young’, Stefano Carretta**, Matthew D. Krzyaniak!*, Michael R. Wasielewski* 


The role of chirality in determining the spin dynamics of photoinduced electron transfer in donor- 
acceptor molecules remains an open question. Although chirality-induced spin selectivity (CISS) has 
been demonstrated in molecules bound to substrates, experimental information about whether this 
process influences spin dynamics in the molecules themselves is lacking. Here we used time-resolved 
electron paramagnetic resonance spectroscopy to show that CISS strongly influences the spin dynamics 
of isolated covalent donor-chiral bridge—acceptor (D-By-A) molecules in which selective photoexcitation 
of D is followed by two rapid, sequential electron-transfer events to yield D’*-By-A*~. Exploiting 

this phenomenon affords the possibility of using chiral molecular building blocks to control electron 


spin states in quantum information applications. 


olecules offer a wide variety of quan- 
tum properties that could potentially 
be exploited in qubit architectures for 
quantum information science (QIS) 
(1, 2). Moreover, molecules afford the 
ability to tailor these properties as applications 
dictate while controlling structure with atomic 
precision. One such property of growing in- 
terest is molecular chirality, which plays an 
essential role in many chemical reactions and 
nearly all biological processes. Naaman, Waldeck, 
and coworkers presented evidence of the rela- 
tionship between molecular chirality and electron 
spin (3, 4) when they observed that electrons 
photoemitted from a gold surface coated with 
a thin film of DNA have a preferred spin state, 
a phenomenon now known as chirality-induced 
spin selectivity (CISS) (5). Subsequent experi- 
ments with molecules bound to metallic, semi- 
conductor, or magnetic substrates have confirmed 
a connection between electron motion and spin 
projection along the chiral axis, which is se- 
lected to be parallel or antiparallel to the motion 
depending on the handedness of the chiral 
molecule (5-9). The spin selectivity of the ef- 
fect can be very high, even at room tempera- 
ture, and its theoretical foundations are still 
being explored (10-17). However, a key prob- 
lem hindering a fundamental understand- 
ing of CISS is that it is difficult to separate 
the role of the substrate from that of the chi- 
ral molecule. 
Hence, it is crucial to investigate how CISS 
affects electron spin dynamics in molecules 
undergoing electron transfer that are not bound 
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to substrates. Achieving this understanding would 
make the design of chiral molecular building 
blocks to manipulate electron spin states pos- 
sible, which has potential for QIS applications. 
In particular, the occurrence of CISS at the 
molecular level has been proposed as an en- 
abling technology for quantum applications, 
e.g., Solving key issues like single-spin read- 
out and high-temperature spin qubit initial- 
ization (6). 

In this work, we show direct evidence of CISS 
in isolated covalent donor-chiral bridge-acceptor 
(D-By-A) molecules in which selective photo- 
excitation of D to its lowest excited singlet state 
(D) is followed by two rapid, sequential electron- 
transfer events: ’D-By-A > D™*-By”-A > D**- 
By-A” (Fig. 1A). If formation of D’-By-A™ occurs 
in <1 ns and the effect of chirality is neglected, 
the resulting entangled electron spin pair is 
prepared initially in a nearly pure singlet state, 
'(D’*-By-A’ ). These states are commonly re- 
ferred to as spin-correlated radical pairs (SCRPs) 
and have been studied in systems ranging from 
photosynthetic proteins (J8-27) and related mod- 
el systems (22-25) to DNA hairpins (26-30). 
However, in all these cases, no consideration 
was given to the possible influence of chirality 
on SCRP spin dynamics. 

To demonstrate the occurrence of CISS, we 
have synthesized pairs of covalent D-By-A 
enantiomers, (R)-1-Ng (-dg) and (S)-1-Ng (-dg), 
where D is either nondeuterated (-Mg) or fully 
deuterated (-dg) peri-xanthenoxanthene (PXX) 
(31), By is a pair of naphthalene-1,8-dicarboximides 
that are linked at their 4-positions to form 
an enantiomeric pair of axially chiral dimers 
(R}NMI, and (S-NMI, (32), and A is naphthalene- 
1,8:4,5-bis(dicarboximide) (NDI) (supplemen- 
tary materials; figs. S1 and S2). The structures of 
(R)-1-Ng (-dg) and (S)-1-ltg (-dg) and the corre- 
sponding achiral reference molecules 2-Mg (-dg) 
are shown in Fig. 1B. The enantiomers were sepa- 
rated by HPLC with a chiral stationary phase 
(fig. S3), and their circular dichroism spectra are 
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given in fig. S4. We have characterized the ch els 
separation and recombination dynamics of t= 
molecules with transient optical absorption (TA) 
spectroscopy, and the CISS effect on their spin 
dynamics with time-resolved electron paramag- 
netic resonance (EPR) spectroscopy using either 
continuous (TREPR) or pulsed microwave ra- 
diation (pulse-EPR). 

We found that CISS yields characteristic 
features in the TREPR spectra of the photo- 
generated PXX**- NMI,-NDI°” SCRP in (R)- 
1-Ng (-dg) and (S)-1-Ng (-dg), which are absent 
in achiral 2-Ng (-dg), when the direction of 
electron transfer is oriented orthogonal to the 
applied static magnetic field direction, in 
agreement with simulations. Conversely, the 
corresponding spectra of PXX**-NMI,-NDI™~ 
are practically identical when the field is paral- 
lel to the electron-transfer direction. 


Time-resolved EPR spectroscopy : 


Samples of (R)-1-Mg (-dg), (S)-1-Ng (-dg), and 
2-Ng (-dg) were each prepared in the nematic 
liquid crystal 4-cyano-4'-(n-pentyl)biphenyl 
(5CB), which was aligned in a magnetic field 
at 295 K, then rapidly frozen to 85 K, which 
aligns the long axes of these molecules along , 
the magnetic field. The orientation of the mol- 
ecules aligned in frozen 5CB can then be ro- 
tated relative to the applied magnetic field 
direction. Because solid 5CB is an optically 
scattering medium, to assess the photo-driven 
charge-separation dynamics of these molecules 
at low temperature, we used both femtosecond 
and nanosecond TA spectroscopy, substitut- 
ing glassy butyronitrile for 5CB at 105 K. Tran- 
sient absorption spectra and kinetics are 
given in figs. S5 and S6. The data show that in 
each case, ultrafast two-step charge separation 
occurs in < 0.2 ns to give PXX**-NDI", which 
recombines to its ground state in time constant 
t = 46 to 66 us, providing ample time for TREPR , 
measurements. The presence of a ~0.35-T sta- 
tic magnetic field in the TREPR experiments 
does not affect the ultrafast electron-transfer 
reactions because the Zeeman interaction 
(~0.3 cm") at that field strength is much less 
than the adiabatic energy gaps (~20 to 80 cm”) 
for these reactions (see table SI and the sup- 
plementary materials for details). 

We used pulse-EPR techniques to assess 
the quality of the alignment of (R)-1-Ng (-do), 
(S)-1-Ng (-dg), and 2-Ng (-dg) in 5CB by mea- 
suring the isotropic exchange (J) and dipolar 
(D) spin-spin interactions for their photogen- 
erated SCRPs, where D(0) = d(1 - 3cos”0) and 
d = 52.04 MHz - nm?/r3,, in the point-dipole 
approximation, which gives detailed distance 
and orientation information as defined by 
the Hamiltonian in eq. S3. If photogenera- 
tion of the SCRP is followed by a Hahn echo 
microwave pulse sequence (1/2 pulse - delay 
t - 1 pulse - delay t - spin echo) and 7 is scanned, 
coherent oscillations between the eigenstates 
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of the SCRP Hamiltonian |®,4)and |®g) (see 
below) that are related to both J and D(@) 
modulate the spin-echo amplitude (21, 33-37). 
When this experiment is performed on spin- 
coherent SCRPs, the echo appears out of phase— 
i.e., in the detection channel in quadrature to 
the one in which it is expected—and is there- 
fore termed out-of-phase electron spin echo en- 
velope modulation (OOP-ESEEM) (21, 33-37). 
For large SCRP distances, 7p,, J can be neg- 
lected and the OOP-ESEEM oscillation fre- 
quency is approximately 2d when 0 = 0°, and 
d when 6 = 90°. Thus, OOP-ESEEM can be used 
to measure SCRP distances for a given angle 
of the dipolar axis relative to the magnetic field 
(28, 36-38). The dipolar axis in SCRPs con- 
nects the centroids of the spin distributions 
of the two radicals. Figures S7 and S8 show 
the OOP-ESEEM data for (R)-1-Ng (-dg), (S)-1- 
Ng (-dg), and 2-Ng (-dg), assuming that their 
dipolar axes are aligned parallel or perpen- 
dicular to the magnetic field. Fitting the 
OOP-ESEEM data showed that the measured 
PXX-hy *-NDI™ distances of (R)-1-No, (S)-1-Ro, 
and 2-Ng were 2.48 + 0.01, 2.48 + 0.01, and 
2.28 + 0.01 nm, respectively, whereas the cor- 
responding PXX-d,°*-NDI" distances of (R)-1- 
dy, (S)-1-dg, and 2-dy were 2.53 + 0.01, 2.51 + 
0.01, and 2.29 + 0.02 nm, respectively (table 
S2). These experimental distances are con- 
sistent with the center-to-center distances 
between PXX and NDI determined from 
density functional-theory calculations on (R)- 
1-Ng, (S)-1-Ng, and 2-fg, where 7p, = 2.59, 
2.60, and 2.40 nm, respectively (fig. S9 and 
tables S3 to S5). The agreement between 
the experimental and calculated distances 
shows that the D°*-By-A°”” SCRPs are well- 
aligned along the magnetic field direction in 
frozen 5CB. 

The TREPR spectra of aligned (R)-1-Ng, (S)- 
1-Ng, and 2-Ng were obtained by photoexciting 
the samples with a 450-nm, 7-ns laser pulse 
and monitoring the magnetization with con- 
tinuous microwaves by using direct detection 
(supplementary materials). The spectra ob- 
tained 100 ns after the laser pulse are shown 
in Fig. 2. When the long axes of these mol- 
ecules are aligned parallel to the magnetic 
field direction (0 = 0°), both enantiomers as 
well as the achiral reference molecule gave 
the same spectra (Fig. 2, A and C). Rotating 
the samples so that the long axes of (R)-1-Ng, 
(S)-1-Ng, and 2-Mg are aligned perpendicu- 
lar to the magnetic field direction (8 = 90°) 
resulted in the appearance of outer wings in 
the spectra of chiral (R)-1-Mg and (S)-1-Ng (Fig. 
2B). No such enhancement was observed for 
achiral 2-fg. As explained below, we posit 
that these new features result from the con- 
tribution of CISS to the formation of the SCRPs 
in (R)-1-Ng and (S)-1-Ng. Deuteration of PXX** 
narrows the overall linewidth of (R)-1-do, 
(S)-1-dyg, and 2-dy while retaining the same 
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Fig. 1. Electron transfer pathways and molecular structures. (A) Electron transfer and intersystem-crossing 
pathways in a D-By-A system with no applied magnetic field, where kes; and kesp are the charge-separation rate 
constants, Ksr is singlet-triplet mixing rate constant, and kcers and kerr are the charge recombination rates through the 
singlet and triplet channels, respectively. (B) Structures of chiral (R)-1 and (S)-1 and achiral 2. The steric constraints 
imposed by linking the two NMI groups in (R)-1 and (S)-1 result in stable enantiomers that have axial chirality. 


orientation dependence of the signal (Fig. 2, 
C and D). 


Effect of CISS on radical pair spin dynamics 


In the molecules described here, the D**-A”~ 
distances are >23 A, so that the spin-spin 
interactions J and D are small relative to the 
~0.35-T applied magnetic field. Thus, the 
Zeeman term is by far the leading term in 
the SCRP spin Hamiltonian (eq. S3), so that 
the SCRP wavefunctions |S) = a (It) —|Jt)) 
and|To) = Js (tL) + |11)), which are magnetic 
field invariant, remain close in energy, where- 
as |T.1) = |ft) and |7_1) = |||) are well sepa- 
rated in energy from both |S) and |7>). In 
particular, both |7,,) and |7_,) are eigen- 
states of the spin Hamiltonian, whereas |S) 
and |7>) are not eigenstates because of the 
different electronic g factors and hyperfine 
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fields of the two spins. Coherent mixing of |S) 
and |T,) yields |®,) = coso|S) + sino|T>) and 
|g) = —sing|S) + coso|To) (Fig. 3A), which 
are eigenstates of the spin Hamiltonian, where 
the angle o in the mixing coefficients is de- 
rived from the magnetic parameters of the 
SCRP (supplementary materials) (39-41). 

In the ultrafast electron-transfer regime ob- 
served here, the initial spin state for an achiral 
SCRP is the entangled singlet |S) state that 
yields populations only on |®,4) and |®x). 
Therefore, four allowed transitions occur be- 
tween |®,) and |®g,) and the initially un- 
populated |7,,) and |T_1) states, giving rise 
to a spin-polarized (out-of-equilibrium) EPR 
spectrum. When 6 = 0°, this results in a typ- 
ical (e, a, e, @) spin-polarization pattern (low to 
high field, where a = enhanced absorption 
and e = emission) because D(8) < 0 (39-41). 
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Fig. 2. TREPR spectra. TREPR spectra of (R)-1-Mg, (S)-1-hg, and 2-hg (A and B) and (R)-1-do, (S)-1-do, 
and 2-dg (C and D) oriented in the nematic liquid crystal 5CB at 85 K and 100 ns after a 450-nm, 7-ns laser 
pulse with the long axis of each molecule 0° [(A) and (C)] and 90° [(B) and (D)] relative to the applied 
magnetic field direction. (Insets) The spectra are shown with their intensities expanded to highlight features 


characteristic of CISS. Tpar, time after laser pulse. 


Conversely, when 0 = 90° the pattern is re- 
versed (a, e, a, €) because D(8) > 0. Because the 
g tensors of PXX** and NDI” are very similar— 
Le., [2.0045, 2.0045, 2.0031] (42) and [2.0047, 
2.0047, 2.0027] (43), respectively—the expected 
SCRP polarization patterns (e, a, e, a) or (a, e, 
a, €) are reduced to broadened (e, a) or (a, e) 
patterns, as observed experimentally for the 
achiral reference molecules 2-Rg and 2-dy 
(Fig. 2, blue traces). Our OOP-ESEEM results 
show that the dipolar axis of each SCRP is well 
aligned with the long axis of each molecule so 
that the dipolar axis and the chirality axis of 
(R)-1-Ng (-dg) and (S)-1-Ng (-dg) are nearly 
parallel. The angle @ between this axis and 
the applied magnetic field (Bo) direction is 
depicted in Fig. 3, B and E, for the parallel 
and perpendicular orientations, respectively. 
CISS mixes triplet character into the initial 
singlet SCRP, thus the initial populations of 
|®,), |g), |T+1), and |7_;) and the corre- 
sponding transition intensities are predicted 
to change as well (44-47). If CISS is the sole 
contribution to the spin dynamics and 0 = 0° 
(Fig. 3D), then the state following electron 
transfer would be |t|) or || t), depending on 
the chirality of the enantiomer and whether Bo 
is parallel or antiparallel to the electron mo- 
tion. Given that the typical alignment of linear 
D-B-A molecules within nematic liquid crys- 
tals is not unidirectional, By has equal prob- 
ability of being parallel or antiparallel to the 
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electron motion and hence, if coherences are 
lost, the initial state is an equal mixture of |* |) 
and ||t) and is thus equivalent to having a 
pure initial |S) state. This situation is shown 
schematically in Fig. 3C, where the blue and 
red traces depict the idealized TREPR spectra 
expected when CISS contributes 0 and 100%, 
respectively. Indeed, the observed spectra of 
both enantiomers as well as the achiral ref- 
erence molecule are practically identical for 
6 = 0° (Fig. 2, A and C). 

By contrast, when the chirality axis is ortho- 
gonal to Bo (Fig. 3B), the initial state is very dif- 
ferent in the presence or absence of CISS. In 
particular, the CISS contribution initially pop- 
ulates |T,,), and |7_1) (Fig. 3G and eq. S12). 
Therefore, if the SCRP spin state has a 100% 
CISS contribution, the TREPR spectra have a 
nearly opposite intensity pattern with re- 
spect to the case in which CISS does not 
contribute. This is shown in Fig. 3F where 
the blue and red lines in the idealized TREPR 
spectra correspond to the intensity for the 
pure |S) (/s) and pure CISS (css) initial con- 
ditions, respectively. 

Starting from recent theoretical models 
(44-47) that describe the influence of CISS 
on SCRP spin dynamics in cases for which 
the CISS contribution is not 100%, the initial 
state will be a superposition or a mixture of 
|S), and |T)) along the chiral axis direction, 
making the detection of CISS less obvious. In 
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fact, the spectral line intensities in this case 
are the weighted sum of Jy and Jc¢qss, which 
occur at the same resonance fields and tend 
to cancel out (see details in the supplementary 
materials). The key to unraveling CISS and 
pure singlet contributions to the SCRP spin 
state in the molecules studied here is the ob- 
servation of a larger EPR linewidth that occurs 
when CISS contributes. Indeed, the sum of the 
two contributions (Fig. 3F, black trace) yields 
a signal that displays lateral wings of opposite 
sign and central features that are narrower 
than those produced in the absence of CISS, 
exactly as observed experimentally in Fig. 2, B 
and D. These features are unambiguous signa- 
tures of CISS because they cannot be produced 
starting from an initial |S) state, where the 
polarization pattern is fixed to (a, e) for 8 = 90° 
by the sign of D(®) (39-41). 

The larger linewidth obtained for the CISS 
initial state arises from the very different 
dependence of Js and Jcysg on the degree of 
coherent mixing in the eigenstates. Indeed, 
|\Tciss| Increases with increasing entanglement 
(d—0) whereas |/s| decreases (fig. S10). Explor- 
ing the variation of the intensity by varying 
the composition of the |®,) and |®g) eigen- 
states is made possible by the presence of sev- 
eral nuclear spins and by distributed magnetic 
parameters, e.g., dipolar couplings, often termed 
strain. Therefore, moving from the center of 
each transition, i.e., the center of the distribu- 
tions of the magnetic parameters and hyperfine 
fields, to the tail of the lineshape corresponds to 
changing the composition of the eigenstates, 
which produces different linewidths for different 
initial states. If entanglement in the eigenstates 
is larger in the tails of the spectrum, the CISS 
contributions result in magnetic field-dependent 
broadening, giving rise to lateral contributions to 
the lineshape of opposite sign with respect to the 
central features (Fig. 3F, black trace). 

To confirm this interpretation, we considered 
the spectra of partially deuterated (R)-1-dg and 
(S)-1-dy. By strongly diminishing the hyperfine 
couplings on one of the two radicals, we changed 
the distribution of the eigenstate composition 
and probed its effect on the lineshape. The mea- 
sured spectra for 8 = 0 and 90° are shown in 
Fig. 2, C and D, respectively. Although no quali- 
tative effect is visible in the parallel direction 
as expected, the lateral wings are substantially 
reduced in the perpendicular orientation. These 
spectra were simulated with a minimal SCRP 
model with either one spin-% nucleus (hydro- 
gen atom) on both NDI” and PXX" or only on 
NDI’, the latter of which is the partially deu- 
terated case. For reasonable values of the hyper- 
fine couplings, the simulations shown in Fig. 4 
reproduce the experimental behavior. 

The intensities of the lateral wings are cor- 
rectly reproduced by combining Js and J¢sgg with 
weights of 41 and 59%, respectively (Fig. 4). 
Although a 59% CISS contribution is remarkable, 
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Fig. 3. The CISS effect on the spin states of 
SCRPs. (A) SCRP spin states in the absence of 
CISS and in the presence of a static magnetic field 
Bo that is much greater than J, D, and the hyperfine 
interactions in both radicals for a singlet precursor. 
The enhanced absorptive (a) and emissive (e) 
microwave-induced EPR transitions are indicated. 
(B) Schematic of chiral molecules aligned parallel to 
Bo. (C) TREPR spectra with @ = O° expected for 

an achiral SCRP (blue trace) and for a chiral SCRP 
with an initial state having a 100% (red trace) 

or partial CISS contribution (black trace). (D) SCRP 
spin states for @ = O° where the initial state has a 
100% CISS contribution. (E) Schematic of chiral 
molecules aligned perpendicular to Bo. (F) TREPR 
spectra with 8 = 90° expected for an achiral SCRP 
(blue trace) and for a chiral SCRP with an initial 
state having 100% CISS contribution (red trace) 

or a partial CISS contribution (black trace, rescaled). 
(G) SCRP spin states for @ = 90° where the initial 
state has a 100% CISS contribution. The widths 

of the energy levels in (A), (D), and (G) indicate 
the population of the initial state, whereas the 
relative arrow thicknesses in the boxes depict the 
transition probabilities. 


it must be stressed that this is a minimal model 
in which the effect of the nuclei is accounted for 
only qualitatively, and a full spectral simulation 
with all nuclear spins in the fully protonated 
molecules is very demanding. However, we 
have been able to perform the simulation for 
the deuterated case, which includes all four 
'H and two “N nuclei coupled to the electron 
spin in NDI” and effects of dipolar strain. In 
this case, the experimental behavior is very well 
reproduced with a 47% CISS contribution, which 
is still considerable. 

Further evidence for the validity of this in- 
terpretation was obtained by investigating the 
time dependence of the TREPR spectra, which 
reflected the time evolution of the D°*-By-A” spin 
states under the combined effect of coherent and 
incoherent terms as described by the stochastic 
Liouville equation and the presence of the mi- 
crowave field (Supplementary materials). Indeed, 
figs. S11 and S12 show that the dependence of the 
observed intensity of the wings of the spectra 
are similar to that of the main peaks, which is 
in agreement with our numerical simulations. 

The CISS contribution to the spin dynamics 
of (R)-1-Ng (-dg) and (S)-1-Mg (-dg) is similar to 
the ~50% spin polarization that was recently 
reported for an axially chiral binaphthalene 
derivative covalently linked to a gold film de- 
posited on nickel (48). Although this single 
comparison suggests that the observed CISS 
effect for the binaphthalene attached to the gold 
surface may be largely due to the chiral molecule, 
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Fig. 4. Simulations of the TREPR spectra with a minimal model of the SCRP. The model places one 
hydrogen nuclear spin-’2 on both PXX** and NDI*~ (A and B), or only on NDI” (C and D). The nuclear spins 
are coupled to each radical with isotropic hyperfine couplings ayp) = 6.3 MHz and apxx = 10 MHz. (Insets) The 
simulations are shown with their intensities expanded to highlight features characteristic of CISS. The 
complete list of simulation parameters is given in table S7. 
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additional comparative work is needed on a 
variety of systems to warrant such a conclusion. 


Conclusions 


We have found direct evidence of the CISS 
effect on the spin dynamics of photogenerated 
radical ion pairs in molecular electron donor- 
acceptor molecules. The observation of CISS 
in these systems affords possibilities both for 
increasing our understanding of this important 
phenomenon and for its possible applications. 
These results show that the substrates or elec- 
trodes with their possibly large spin-orbit coupl- 
ings are not needed for CISS to occur, and that 
TREPR spectroscopy can directly access the spin 
dynamics that result from CISS. This provides key 
information to guide theoretical investigations and 
makes possible many new targeted experimental 
studies. In addition, observing CISS at the mo- 
lecular level is the first step required to trans- 
form this fundamental phenomenon into an 
enabling technology for quantum applications. 
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Ice-confined synthesis of highly ionized 
3D-quasilayered polyamide nanofiltration membranes 


Yanqiu Zhang’, Hao Wang", Jing Guo’, Xiquan Cheng*, Gang Han’, Cher Hon Lau®, Haiqing Lin®, 


Shaomin Liu’, Jun Ma’, Lu Shao!“ 


Existing polyamide (PA) membrane synthesis protocols are underpinned by controlling diffusion- 
dominant liquid-phase reactions that yield subpar spatial architectures and ionization behavior. 
We report an ice-confined interfacial polymerization strategy to enable the effective kinetic control 
of the interfacial reaction and thermodynamic manipulation of the hexagonal polytype (/;,) ice 
phase containing monomers to rationally synthesize a three-dimensional quasilayered PA membrane 
for nanofiltration. Experiments and molecular simulations confirmed the underlying membrane 
formation mechanism. Our ice-confined PA nanofiltration membrane features high-density ionized 
structure and exceptional transport channels, realizing superior water permeance and excellent 


ion selectivity. 


embrane-based separation technolo- 

gies, characterized by high energy effi- 

ciency, low carbon emissions, and high 

design flexibility, have evolved into an 

effective and sustainable approach for 
alleviating global water scarcity, environmental 
remediation, and resource recovery (J-4). Nano- 
filtration (NF) has emerged as a cost-effective 
membrane separation process for efficiently 
rejecting small molecules and multivalent ions 
and shows immense promise in wastewater 
treatment, water softening, and purification 
processes (5). 

Polyamide (PA) membranes prepared by 
interfacial polymerization (IP) are the bench- 
mark NF membranes (6, 7). The nanostructures 
and ionization behavior of PA membranes have 
been shown to play a crucial role in deter- 
mining membrane separation performance 
(i.e., permeability and selectivity). Because the 
rate of polycondensation reactions between 
organic amines and acyl chlorides is several 
orders of magnitude faster than the diffu- 
sion rate of the amines in the organic phase 
solution during IP, the ideal PA NF membrane 
architectures are difficult to achieve by con- 
ventional diffusion-dominant IP (5). Researchers 
have attempted to control the temperature of 
the organic phase and the m-phenylenediamine 
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(MPD)-soaked substrate to adjust the trans- 
mission rate of diamine during IP (8-17); how- 
ever, the current diffusion-dominant monomer 
distribution lacks spatial regulation, resulting in 
insufficient membrane cross-linking or growth 
inhibition. Other molecular engineering ap- 
proaches have been proposed that tailor the 
chemical structures of the monomers, incor- 
porate nanomaterials with charges onto the 
membrane surface and interior, and use reac- 
tive monomers with ionizable groups to obtain 
high low-/high-valent ion selectivity (72). 

Ice chemistry engineering is a promising 
approach for producing composite materials 
with three-dimensional (3D) hierarchical or 
anisotropic architectures such as nacre-like 
ceramics (13), cellular polymeric composites 
(14), and hydrogels and aerogels (15, 16). The 
ice-melting process can elegantly manipulate 
the molecular packing behavior during crystal- 
line synthesis, and the solid-liquid phase tran- 
sition significantly changes the reaction kinetics 
and thermodynamics of material synthesis 
(17, 18). The dissolved reactants in ice can be 
controllably released due to the confined effect 
of the hexagonal polytype (4) ice phase (19). 
Herein, we conceived an ice-confined inter- 
facial polymerization (IC-IP) strategy to synthe- 
size a highly ionized, 3D-quasilayered PA NF 
membrane by using an ice-melting-induced 
ice/water phase transition. This phase transi- 
tion process can deliberately control the diffu- 
sion and reaction rates of the MPD monomers 
in the J, “ice” phase as the confined interface. 

IC-IP synthesis was conducted at the inter- 
face between the MPD-ice (Fig. 1, A and B, and 
figs. S1 and S2) and the n-hexane solution 
(freezing point of ~-95.3°C) containing trimesoyl 
chloride (TMC). The MPD aqueous solution was 
frozen at -20°C. Subsequently, the TMC/ 
n-hexane solution (precooled to ambient temper- 
ature) was added to the MPD-ice surface. As 
the frozen MPD-ice slowly melted, MPD was 
gradually released. In contrast to the conven- 
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dense, heterogeneous structure (Fig. 1C and 
figs. S3 and S4), the specific 3D-quasilayered 
wrinkled architecture (Fig. 1, D and E, and figs. 
S5 and S6) for an ice-confined PA NF (IC- 
PANF) membrane can be created with ice chem- 
istry engineering. 


Structure characterizations of 
IC-PANF membranes 


Scanning electron microscopy (SEM) and atom- 
ic force microscopy (AFM) measurements 
demonstrated that the IC-PANF membrane with 
a regular wrinkled structure was rougher than 
the C-PANF membrane (Fig. 1D and fig. S6), 
resulting in a 2-fold increase in the accessible 
surface area of the IC-PANF membrane (table 
S1 and fig. S7). Furthermore, the IC-PANF 
membrane maintained a dominant fraction of 
subnanometer pores (fig. S8), indicating that 
the IC-PANF membrane has a larger water- 
accessible surface area for faster water transport 
and well-defined selective pores. Transmission 
electron microscopy (TEM) revealed that the 
IC-PANF membrane consisted of a packed, 
interconnected microporous layered internal 
structure (Fig. 1E), and positron annihilation 
spectroscopy (PAS) measurements elucidated 
that the substitution of the MPD-aqueous solu- 
tion with MPD-ice resulted in a significant in- 
crease in the fractional internal free volume 
of the PA layer (fig. S9). To further probe the 
internal structure, the 3D void microstructure 
was reconstructed using focused ion beam- 
SEM, confirming that the IC-PANF mem- 
brane comprised larger microvoid stacks of 
3D-quasilayers (fig. S10), which was consist- 
ent with the TEM and PAS results. By con- 
trast, the C-PANF membrane was mostly dense 
with no observable microscale pores or layers 
(Fig. 1C). The simultaneous increase in surface 
wrinkles and interior microporosity was spe- 
cific to the IC-PANF membrane obtained by 
ice chemistry engineering. 


Mechanism of IC-IP 


To elucidate the synergistic effects of the diffu- 
sion kinetics and ice-melting thermodynamics 
on the spatial architecture of the PA membrane 
during synthesis, the molecular mechanisms 
and process characteristic time scales were in- 
vestigated. Ice melting is the least hindered 
phase transition, requiring the disruption of the 
crystalline order to achieve the liquid phase. 
With the stepwise increase in temperature, 
the molecular diffusion rate in ice is gradually 
accelerated. Upon reaching the melting point 
temperature (7;,), the ice began to melt and 
transition into a solid/liquid phase (Fig. 2A). 
Therefore, the formation of the IC-PANF mem- 
brane during IC-IP synthesis can be interpreted 
by breaking down the MPD-ice phase temper- 
ature (T\pp-p) into four stages: (i) Tipp» < Tm, 


lofs 


RESEARCH | 


RESEARCH ARTICLE 


errrs shy yaya he 


PP SCS TERS: a , 


Bey Bap ligyy getiy Bap iby tty My 
aghigh sgh nghh ginghe Pugh ghogh 
By ge dig Ry ig: Bigs by iy Say My, 
rivtrytior<y 
Bee By Bag Pine My Bay Hyp Pag 
oe oe Geant gg 
Big Ri By gg Bh A, ig My Age tg, 
igh Porth gh hrgiaghgt 
yp Fay ty ty. ig Bhag g Riny My Ory 


Fig. 1. Ice-confined interfacial polymerization and PA NF membrane structure. (A) Schematic of the IC-IP at the ice/n-hexane interface. (B) Ice /,, phase. (C and 
E) TEM cross-sectional images of the C-PANF (C) and IC-PANF (E) membranes. (D) SEM images of the IC-PANF membrane surface under different magnifications. 


(ii) Tvpp-p = Tm Gil) Tupp-p > Tm, and (iv) 
Tmpp-p= T, (ambient temperature) (fig. S11). 

Compared with bulk water, the higher vis- 
cosity of a quasiliquid layer and the lower po- 
tential energy of ice are crucial for developing 
a large-area wetting layer (17, 20). Molecular 
dynamics (MD) simulations revealed that TMC 
molecules tended to anchor on the MPD-ice 
surface (ice-like liquid water layer) when reach- 
ing equilibrium at the first stage (Tpp-p < Tm 
in figs. S12 to S16) (27). With an increase in the 
MPD concentration in ice, more TMC mole- 
cules were anchored on the ice surface, and a 
high adsorption energy (|E,qs|) was achieved by 
hydrogen bonding and van der Waals forces 
(Fig. 2B). This regulated the buildup of MPD 
and TMC in a planar 2D space and was condu- 
cive to their polymerization in a constricted 
interfacial liquid layer (figs. S12 to S16 and 
table S2). By contrast, TMC and MPD re- 
acted rapidly in a traditional IP process, 
making it difficult to control the distribution of 
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TMC at the interface. At Tyypp-p= Tm, ice began 
to melt and formed an ice/water-hexane inter- 
face. At this point, the diffusion kinetics and 
reaction thermodynamics of TMC and MPD 
were vital for PA membrane formation at the 
ice/water-hexane interface (22). MD simula- 
tions in Fig. 2C revealed that the MPD diffused 
slower at the ice/water-hexane interface when 
equilibrium was reached at T,,,. This was mainly 
attributed to the confinement of MPD mole- 
cules within the ice, where the absolute value 
of the binding free energy between ice and 
MPD (AGpinaingampp-ice) = -34.9 kcal mol") was 
significantly higher than that of the MPD with 
water (AGpinding(mpD-water) = —14.6 kcal mol™*) 
(Fig. 2A), and the adsorption of TMC on the ice 
surface (at Typp-p» < Tim) synergized with the 
high viscosity of the MPD-ice/water mixture 
(fig. S17). In particular, the confinement effect 
of ice also affected the MPD concentration in the 
ice/water phase. This can be further manifested 
by tracking the release kinetics of MPD mono- 
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mers from ice. As the ice started to melt, the 
MPD concentration initially showed a rapid 
increase and then decreased during the ice- 
melting process (fig. S18). The larger MPD con- 
centration gradient and lower MPD diffusion 
rate led to the formation of a loose, heteroge- 
neous macroporous structure at Typp-p = Tm 
(fig. S19A). When Typp-p was higher than T),, 
all ice crystals melted into water, and the MPD 
phase temperature increased from 7, to T,. At 
this stage, the reaction rate between MPD and 
TMC decreased, as demonstrated by a reduc- 
tion in the Gibbs free energy (AG) from -38.6 
to -42.8 kJ mol” (Fig. 2D). MPD reacted with 
TMC through the gaps and channels of the pre- 
viously formed macroporous layer to create 
new layers by IP (7). During the melting pro- 
cess, gases dissolved in the ice were released, 
which was also conducive to the formation of 
an internal microporous-layered structure (fig. 
S11) UI, 23, 24). The SEM results and separa- 
tion performance of the IC-PANF membrane 
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Fig. 2. Mechanism of IC-IP. (A) Schematic of the structural evolutions of MPD-ice and MPD-water and the Gibbs binding free energy (AGpinding) at Tpp-p=Tm by 
MD simulation. (B) The Eag, of TMC with varying concentrations of MPD-ice at the n-hexane/ice interface. (C) MD simulations of the MPD diffusion coefficient at the 
initial reaction temperature for conventional IP (28°C) and IC-IP (-2.1°C). (D) AG for the MPD interaction with TMC from T,, to Ta by density functional theory (DFT). 
(E) X-ray diffraction (XRD) results of MPD-ice at different freezing times. 
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Fig. 3. lonization behavior of IC-PANF and C-PANF membranes. (A) Schematic of the high ionization behavior of the IC-PANF membrane segment (the 3D 
interconnected layered network with internal larger microvoids amplified the exposed water-contacted surface) and a snapshot of the diffusion of HO, Na*, and SO,7” 
inside the pores (green spheres for Na* and yellow spheres for SO4* by molecular simulation). (B and C) Areal R-COO” and R-NH3* density as a function of pH by 
experiments. (D and E) Zeta potential (D) and water permeance and NazSOQ, rejection (E) as a function of pH. Error bars represent the SDs of three independent 
measurements. 
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Fig. 4. Salt separation 
performance and ion 
selectivity of the 

IC-PANF membrane. 

(A) Performance comparison 
of the IC-PANF membrane 
with other state-of-the-art 
membranes for separating 
NazSO4. The red star rep- 
resents the as-prepared IC- 
PANF membrane (table S7). 
(B) Separation performance 
of IC-PANF membrane for 
various Salts. (C) Water 
permeance, Na2SO, 
rejection, and the co-ion 
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as a function of temperature during IC-IP were 
consistent with the above analysis (figs. S19 and 
S20). As a result, this underpinned the forma- 
tion of a 3D-quasilayered architecture under the 
coupling effects of controlled monomer diffu- 
sion kinetics and ice-melting-engineered ther- 
modynamics during IC-IP synthesis. 

The inherent structures of MPD-ice nuclea- 
tion and the IC-PANF membrane strongly 
depended on the MPD concentration (23, 24). 
Variations in 7;,, and enthalpy (AH,,,) with ice 
melting were monitored by differential scan- 
ning calorimetry (figs. S21 and S22), where the 
Tm Of MPD-ice decreased from -0.2 to -3.0°C, 
and the AH,, of ice decreased from 359.4 to 
87.9 J g' as the initial MPD concentration 
increased. The decrease in the number of H- 
bonded water molecules with MPD at high 
concentrations suppressed ice nucleation. This 
hindered the freezing of water (figs. S23 to 
S27) and prevented water molecules from form- 
ing ordered hydrogen bond networks, which 
was the geometrical basis for crystallization 
and the lowered 7, and AH, (25). These 
changes altered the MPD concentration and 
diffusion rate at the interface during IC-IP syn- 
thesis for specific membrane structure forma- 
tion (figs. S18, S28, and S29). 

The freezing duration was also an essential 
parameter in tailoring the nucleation and crys- 
tallization of MPD-ice, which could disrupt 
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the distribution and release of MPD from 
ice. [,-ice crystals were obtained after 8 hours 
of freezing, yielding 3D-quasilayered architec- 
tures (Fig. 2E and figs. S30 to S33). Shorter 
freezing times (0.5 and 2 hours) were insuffi- 
cient to crystallize the MPD-water into ice. Fur- 
thermore, a longer freezing time (12 hours) 
resulted in an undesirable pore structure due 
to the partial precipitation and accumulation 
of MPD molecules on the ice surface by the 
MPD solute segregation (26). Moreover, 7, 
had a noticeable effect on the IC-IP process (figs. 
S34 and $35). Lower T,, reduced the melting rate 
of MPD-ice and the release of MPD. Simul- 
taneously, because of the high viscosity and 
high surface tension of organic solvents at low 
temperature, the dissolution of MPD was 
reduced, leading to the formation of defects 
on the PA membrane surface. At a higher T,, 
the melting rate of ice was accelerated during 
the IC-IP process, and the ice-confining role 
was inevitably weakened, resulting in the 
formation of a dense PA layer with lower 
water permeance. The optimal 7, to yield 3D- 
quasilayered PA membranes was 28°C, which 
is similar to the temperature used in incumbent 
processes and in industry. 


Membrane ionization behavior 


The 3D-quasilayered structure endowed the 
IC-PANF membrane with a high ionization 
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density, which was crucial for enhancing the 
separation efficiency of charged species in 
aqueous solutions. Bulk water is a favorable me- 
dium for R-COOH ionization, in which the 
highly polarizable water molecules can readily 
stabilize the electrostatic charge of R-COO . 
R-COOH intensively located on the membrane 
surface and in large inherent pores (>>2 nm) 
satisfied the ionizing conditions (27). The quan- 
titative ion chromatography results verified 
that the associated areal R-COO’ densities sig- 
nificantly increased for the IC-PANF mem- 
brane compared with the C-PANF membrane 
(Fig. 3, A and B). The abundant -COOH (as 
noted in the x-ray photoelectron spectroscopy 
results) on the surface and interior of the IC- 
PANF membrane (figs. $36 to S38 and tables 
S3 to S6) and the 3D interconnected layered 
network with internal larger microvoids signif- 
icantly amplified the exposed water-contacted 
surface, promoting the ionization of R-COOH 
(Fig. 3, Aand B). Furthermore, the areal R-NH;" 
densities of the IC-PANF membrane illustrated 
single pK, (pK, ~9.5) ionization of R-NH;° and 
high areal R-NH;° densities (Fig. 3C). Therefore, 
the IC-PANF membrane obtained through ice 
chemistry engineering remained highly ionized 
over a wide pH range. The ionization behavior 
of the IC-PANF membrane can be interpreted 
by the ionization of R-COOH and R-NH, at dif- 
ferent pH values (Fig. 3D and fig. S39): <5, 5 to 
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9.5, and >9.5 At the first stage with pH < 5, all 
of the R-NH, was converted to R-NH;° and 
there was no change in its ionization. As the 
solution pH approached 5, ionization of sur- 
face R-COOH gradually occurred. When enter- 
ing the second stage (from pH 5 to 9.5), the 
surface R-COOH was fully ionized, whereas 
the interior R-COOH was subsequently ionized. 
In the final stage (pH > 9.5), R-NH;* was com- 
pletely deprotonated (all were R-NH,). Thus, 
the zeta potential of the IC-PANF membrane 
was lower than that of the C-PANF membrane 
when the pH was >5. The salt separation effi- 
cacy of the synthesized membranes as a func- 
tion of the pH was preliminarily assessed 
using a Na,SO, solution (Fig. 3E). At pH < 5, 
SO,” was rejected due to Na* exclusion by 
R-NH;° to maintain solution electroneutrality. 
As the pH increased to 5, the NagSO, rejection 
reached a minimum value because of surface 
R-COOH ionization. The diminished SO,7~ 
rejection at this point likely stemmed from 
the increased salt sorption. SO,” rejection in- 
creased substantially until pH > 5 for the 
IC-PANF membrane, where the deprotonation 
of R-NH;° and interior R-COOH significantly 
enhanced SO," exclusion. 


Membrane ion-sieving performance 


We further evaluated the membrane separa- 
tion performance using various salts tests to 
explore the structure-ionization-property rela- 
tionships. Compared with the C-PANF mem- 
brane, the IC-PANF membrane exhibited an 
~4-fold enhancement in water permeance (i.e., 
from 6.8 to 29.7 liters m~* h™ bar”) with an 
unnoticeable reduction in Na SO, rejection. 
Wealso compared the performance with other 
state-of-the-art PA NF membranes (Fig. 4A 
and table S7) (7, 28-38). The enlarged, inter- 
connected, 3D-quasilayered, and high ioniza- 
tion features endowed the IC-PANF membrane 
with excellent hydrophilicity and high water- 
adsorption ability, resulting in fast water 
pathways with low resistance (figs. S40 and S41 
and table S8). The salt rejection of the IC-PANF 
membrane decreased in the order of Na .SOx. 
(99.6%) > MgSO, (93.2%) > MgCle (78.3%) > 
NaCl (42.5%) (Fig. 4B). The separation mecha- 
nism of the IC-PANF membrane for various 
salts was dominated by the synergistic effect of 
Donnan repulsion, size sieving, and dielectric 
exclusion. The higher ionization R-COOH 
groups endowed the membrane with highly 
negatively charged densities on the surface 
and interior; therefore, the Donnan repulsion 
for SO,” was stronger than that for Cl. In 
addition, the ion solvation energy barrier (when 
ions transfer from the bulk solution to the 
membrane pores with low permittivity) and 
the hydration radius of Cl” are smaller than 
that of SO,” (table S9) (27, 39). As a result, the 
IC-PANF membrane had a higher rejection 
to SO,” than Cl and exhibited a single NaCl/ 
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NaSO,, selectivity of up to 143.8 (fig. S42 and 
table S10). 

Given that real saline water consists of a mix- 
ture of inorganic salts, we further measured 
the co-ion Cl’/SO,”° selectivity using a mix- 
ture of NaCl and Na,SO,. Cl”/SO,”" separation 
is beneficial for recovering valuable salt re- 
sources and obtaining clean fresh water (5). 
The substitution of the MPD aqueous solution 
with MPD-ice achieved a 6- to 10-fold higher 
co-ion Cl7/SO,”" selectivity in the IC-PANF 
membrane compared with the C-PANF mem- 
brane (Fig. 4C and fig. S43). When increasing 
the feed solution concentration, the electro- 
static screening effects were enhanced, which 
reduced the membrane charge density and 
weakened the electrostatic repulsion for ions, 
resulting in a decline in the Cl’/SO,”° selec- 
tivity or salt rejection. However, because of the 
lower sensitivity to the charge densities of 
the highly ionization structures, the IC-PANF 
membrane retained a higher Cl”/SO,”" selec- 
tivity and Na SO, rejection even at high salt 
concentrations (Fig. 4C and fig. S43). The IC- 
PANF membrane also demonstrated excellent 
ion sieving for different co-ion systems, supe- 
rior long-term separation performance, and 
promising antifouling properties (figs. S44: to 
S48). This co-ion selectivity provides promise 
for brine refinement and salt reclamation, but 
more importantly, our ice-confined synthesis 
strategy can be extended to diverse IP processes 
(figs. S49 and S50). 

In summary, we developed an IC-IP method 
to engineer the spatial architectures and ion- 
ization behavior of a PA NF membrane. The 
synergetic control of the reaction kinetics and 
thermodynamics of ice melting during IC-IP 
led to the formation of a highly ionized, 3D- 
quasilayered structure that combines high water 
permeance and unparalleled co-ion sieving 
abilities. We foresee that with the help of the 
presented strategy, a versatile “ice-confined” 
synthesis approach can enrich the current 
chemistry toolbox for synthesizing membranes 
and diverse advanced materials. 
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Boom-bust cycles in gray whales associated with 
dynamic and changing Arctic conditions 


Joshua D. Stewart!*, Trevor W. Joyce’, John W. Durban?“, John Calambokidis°, Deborah Fauquier®, 
Holly Fearnbach*, Jacqueline M. Grebmeier’, Morgan Lynn?, Manfredi Manizza®, Wayne L. Perryman®, 


M. Tim Tinker??°, David W. Weller® 


Climate change is affecting a wide range of global systems, with polar ecosystems experiencing the 
most rapid change. Although climate impacts affect lower-trophic-level and short-lived species most 
directly, it is less clear how long-lived and mobile species will respond to rapid polar warming because 
they may have the short-term ability to accommodate ecological disruptions while adapting to new 
conditions. We found that the population dynamics of an iconic and highly mobile polar-associated species 
are tightly coupled to Arctic prey availability and access to feeding areas. When low prey biomass 
coincided with high ice cover, gray whales experienced major mortality events, each reducing the 
population by 15 to 25%. This suggests that even mobile, long-lived species are sensitive to dynamic and 


changing conditions as the Arctic warms. 


he Bering and Chukchi seas in the Pacific 
Arctic are extremely productive shallow 
basins (1-3) that support seasonal for- 
aging opportunities for a wide variety of 
migratory and Arctic-associated taxa (4). 
The Pacific Arctic food web is characterized 
by ice-associated algal growth during spring 
and early summer, which is transported to the 
benthos through decay and sinking of particu- 
late organic carbon (3). This tight pelagic- 
benthic coupling historically resulted in some 
of the most productive nearshore benthic sys- 
tems in the world (3), attracting migrants from 
throughout the Pacific and supporting large 
populations of marine species (4, 5). 

As the Arctic has rapidly warmed, sea ice 
retreat has occurred progressively earlier in the 
spring, and the Bering and Chukchi seas have 
remained ice free for longer in the autumn (6). 
This has resulted in increased water-column 
productivity (7, 8) but has reduced the amount 
of particulate organic carbon that reaches the 
sea floor through pelagic-benthic coupling that 
is dependent on sinking ice-associated algae 
(5). In addition, decreased sea ice cover allows 
stronger current-driven flow over the shallow 
basins of the Pacific Arctic, reducing the quan- 
tity of finer-sediment grain size within the 
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benthos that support habitat for tube-building 
amphipods, which have some of the highest 
lipid content of benthic crustaceans (9, 10). 
Collectively, these impacts have driven changes 
to the structure of Arctic benthic communities, 
which may translate into impacts on higher- 
trophic-level species that migrate seasonally to 
access these foraging hotspots (5, 9, 10). 
Eastern North Pacific gray whales (Eschrichtius 
robustus) undertake one of the longest mam- 
malian migrations between wintering areas in 
Baja California, Mexico, and summer feeding 
areas in the Bering and Chukchi seas to take 
advantage of these highly concentrated ben- 
thic prey resources (11). Gray whales have spe- 
cialized baleen plates adapted to suction feeding 
in soft sediments and are the only baleen 
whale to feed primarily on benthic prey (J). Al- 
though they are capable of feeding on pelagic 
zooplankton, the diet of gray whales feeding in 
the Arctic is dominated by benthic crustaceans— 
in particular, ampeliscid amphipods—that are 
found in abundance in shallow Arctic basins (72). 
Estimates of pre-whaling population sizes 
range from 15,000 to 30,000 individuals for 
the eastern North Pacific gray whale popula- 
tion, based on population models fitted to esti- 
mates from abundance surveys combined with 
commercial and aboriginal harvest data (13). 
Genetic estimates of prehistoric abundance 
are much higher, ranging from ~'75,000 to 
120,000 individuals (4), although this likely 
included the now endangered western North 
Pacific population and may reflect a larger 
calrying capacity supported by increased ben- 
thic habitat availability during the Last Glacial 
Minimum (J5). Commercial whaling in the 
lagoons of Baja California and throughout 
the North Pacific depleted the eastern North 
Pacific gray whale population to fewer than 
5000 individuals by the early 1900s (13). A 
rapid and sustained post-whaling increase in 
abundance led to the delisting of the popula- 
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tion from the Endangered Species Act in | Sen 
and is widely viewed as an iconic examp|_— 
successful conservation and species recovery (6). 

The status and stability of eastern North 
Pacific gray whales has come into question 
as the population experienced two major docu- 
mented mortality events in 1999-2000 and 
2019-2022 (17, 18). In response to the first 
mortality event in 1999, there was speculation 
that the population may have reached its 
carrying capacity and was suffering from 
density-dependent effects on survival (19). In 
light of fluctuations in reproductive output and 
a second major mortality event two decades 
later, many studies have proposed that variable 
and changing Arctic conditions may be drivers 
of eastern North Pacific gray whale population 
dynamics (12, 20-22). 

Arctic sea ice extent has been proposed as a 
contributor to gray whale vital rates—especially 
reproduction—by physically restricting access 
to summer feeding areas (20, 22, 23). However, 
in recent years previously identified relation- 
ships between gray whale reproduction and 
Arctic sea ice extent have begun to decouple 
(22, 23), and variability in sea ice has been 
insufficient to explain mortality rates (20). . 
Eastern North Pacific gray whales have the 
most complete long-term abundance and demo- 
graphic data available for any large whale 
species, and we leveraged these extensive data- 
sets to examine environmental drivers of pop- 
ulation dynamics not possible in other species. 
We combined time series of gray whale abun- 
dance, reproduction, nutritive condition, and 
strandings spanning more than half a century 
into a population dynamics model to esti- 
mate annual carrying capacity for the pop- 
ulation. We show that this annual carrying 
capacity is well explained by ice-mediated 
access to the population’s primary foraging 
grounds in the Arctic and biomass of benthic , 
crustaceans. The observed boom-bust cycles in 
gray whale abundance and vital rates suggest 
that as large whales recover from post-whaling 
depletion, their populations may become in- 
creasingly governed by environmental con- 
straints and climate variability. 


Results and discussion 


We combined 31 estimates of eastern North 
Pacific gray whale abundance over 54 years 
(1968 to 2022) (24), 30 estimates of calf pro- 
duction over 42 years (1980 to 2022) (22, 25), 
1391 records of stranded gray whales on the 
United States coastline over 48 years (1974 to 
2022), and 1334 body condition measure- 
ments over 32 years (1987 to 2019) (26) into an 
integrated population dynamics model that 
estimates annual abundance, birth rates, and 
mortality rates. The model uses evidence of 
human interactions in stranded gray whales 
to estimate proportional hazards of anthropo- 
genic and natural contributions to mortality. 
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Fig. 1. Population dynamics of 
eastern North Pacific gray 
whales. (A) Gray whales have 
experienced major fluctuations in 


ae 30000 
abundance after an initial post- 9 
whaling recovery, including three a 
major declines beginning in 1987, ec 
1999, and 2019. (B to E) These 3 20000 


declines and subsequent recoveries 
in the 1990s and 2000s were 
associated with synchronous 
changes in (B) births and 

(C) mortality, as well as changes 

in nutritive condition in 

(D) southbound and (E) northbound 
migrating whales. Black points in 
(A) and (B) indicate the median 
estimated abundance and calf pro- 
duction from visual surveys, with 
standard errors of model estimates 
(vertical bars). Black points in (D) 


10000 


Births 


and (E) indicate the mean values Cc 


of body condition measurements 
from each survey year and the 
standard deviation of observations 
(vertical bars). In (A) to (E), the 
black lines indicate the median of 
the posterior distribution of model- 0 
estimated values, and the shaded 
regions indicate the 95% posterior 
credible intervals. 


Condition (S) Deaths 


Condition (N) 


In addition, the model estimates both the long- 
term carrying capacity (K), as well as an annually 
varying carrying capacity (K,) that reflects 
year-to-year variation in the strength of negative 
density dependence as determined by environ- 
mental covariates and stochastic effects. We 
considered three Arctic time series as candi- 
date covariates for annual gray whale carrying 
capacity: (i) access to feeding grounds, defined 
as the number of days with <50% sea ice cover 
on the historic gray whale foraging grounds 
in the Chirikov basin and southern Chukchi 
Sea (1979 to 2021) (23, 27); (ii) benthic infaunal 
crustacean biomass, averaged over the same 
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foraging hotspots as sea ice access (1971 to 
2019) (28); and (iii) zooplankton density es- 
timated by using a global ocean ecosystem 
model that includes the entire Arctic Ocean 
ecosystem, averaged over gray whale foraging 
hotspots (1992 to 2020) (29). The data and 
population model are described in detail in the 
Data sources and Integrated population model 
sections of the supplementary materials. 

The eastern North Pacific gray whale pop- 
ulation has experienced three major mortality 
events, each resulting in reductions of 15 to 
25% of total abundance within the half-century 
of nearly continuous monitoring, representing 
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extraordinarily high periodic mortality rates for 
a long-lived vertebrate (Fig. 1). These mortality 
events were associated with peaks in reported 
strandings during the 1999-2000 and 2019-2022 
periods. The 1987-1989 abundance decline is 
the largest in magnitude but was not asso- 
ciated with an increase in strandings, likely 
because reporting structures and survey effort 
to detect strandings were expanded and im- 
proved substantially beginning in 1990. How- 
ever, this major impact to the population is also 
reflected in the poorest recorded body condi- 
tion of the survey history in 1988, falling rapidly 
from very good condition in 1987 (Fig. 1D). The 
population dynamics model estimated low an- 
nual carrying capacities (K,) of approximately 
10,000 individuals during each of these die- 
offs (Fig. 2A), indicating that Arctic foraging 
grounds periodically experience major disrup- 
tions, limiting the number of whales that 
they can support. These fluctuations in annual 
carrying capacity were represented in mortality 
rates, body condition, and most strongly in 
birth rates, which had the greatest propor- 
tional change with varying carrying capacity 
(fig. S5). On the basis of anthropogenic injury 
rates in stranded whales, model-estimated an- 
thropogenic mortality rates remained low and 
stable, whereas natural mortality rates varied 
substantially and peaked during major die- 
offs, suggesting direct human impacts such 
as vessel strikes and entanglements in fishing 
gear are not the primary drivers of mortality in 
this population. 

The maximum birth rate estimated by the 
model was 0.111 (95% credible intervals 0.108 
to 0.114). The realized annual birth rate ranged 
from a low of 0.0046 in 1998 (0.0024 to 
0.0076) to a high of 0.085 in 1975 (0.062 to 
0.102). Within the span of calf production ob- 
servations (1994-2022), the minimum birth 
rate was 0.007 in 2000 (0.004: to 0.01), and the 
maximum was 0.082 in 2004 (0.069 to 0.09). 
The minimum estimated mortality rate was 
0.011 (0.009 to 0.014). The realized annual 
mortality rate ranged from a low of 0.019 in 
1975 (0.014 to 0.027) to a high of 0.13 in 1988 
(0.099 to 0.162). During the three major mortal- 
ity events, median estimated mortality rates 
were 0.13 and 0.079 (in 1988 and 1989); 0.065 
and 0.099 (in 1999 and 2000); and 0.092, 0.089, 
0.061, and 0.067 (from 2019 to 2022). 

Model-estimated mean body condition was 
lowest in 1988 [median 0.162, 95% confidence 
interval (CI) 0.158 to 0.166], 2000 (0.165, 0.163 
to 0.168), and 2020 (0.167, 0.163 to 0.170). The 
highest estimated body condition was in 1975 
(0.184, 0.181 to 0.187), although there were no 
photogrammetric measurements before 1987. 
The 3 years with highest estimated body con- 
dition and corresponding condition measure- 
ments were 2013 (0.181, 0.180 to 0.183), 2012 
(0.181, 0.179 to 0.183), and 1997 (0.181, 0.179 
to 0.182). The estimated northbound body 
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Fig. 2. Drivers of eastern A 
North Pacific gray whale 
carrying capacity. (A) Esti- 
mated annual carrying 
capacity (K;) from the 
population dynamics model, 
with reference lines at 
25,000 (dashed line) and 
10,000 (dotted line). 

(B) Estimated ice access 
anomaly, which is the 
Z-scored number of days 
with 50% or lower ice cover 
on gray whale feeding 
grounds. (C) Estimated 
crustacean biomass anom- 
aly, which is the Z-scored 
mean grams of carbon of 
benthic crustaceans on key 
gray whale feeding grounds. 
(D) Decline in benthic crus- 
tacean per capita biomass 
from 1970 to 2019, showing 
the relationship each 
sampling year between ben- 
thic crustacean abundance 
and biomass in grams of 
carbon (gC). In (A) to (C), 
the black lines indicate the 
median of the posterior 
distribution of estimates, 
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and the shaded regions indicate the 95% posterior credible intervals. 


condition scaling factor was 0.922 (0.913 to 
0.930), indicating an ~8% decline in body con- 
dition between southbound and northbound 
measurements. 

The estimated long-term average K was 
22,062 (18,967 to 24,725). This long-term average 
is lower than the median of annual K, values 
(24,500, 95% CI 21,771 to 27,797), which is to be 
expected given that it is the arithmetic mean 
outcome of a stochastic process and thus re- 
flects the effects of environmental variability 
on expected abundance (30). 

We found a significant positive relationship 
between benthic crustacean biomass and carry- 
ing capacity (99.9% probability slope > 0), no 
relationship with zooplankton density (39.2% > 0), 
and a high probability of a positive relation- 
ship with sea ice access (93.5% > 0). With the 
zooplankton density covariate eliminated from 
the model, both crustacean biomass (100% > 0) 
and sea ice access (96.2% > 0) had significant 
positive relationships with carrying capacity. 
This suggests that the ability of the eastern 
North Pacific gray whale population to physi- 
cally access key feeding areas, in combination 
with in situ prey availability, explains fluctua- 
tions in body condition, reproduction, and 
mortality. The three major mortality events 
occurred during periods of simultaneous low 
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crustacean biomass and restricted access to 
feeding areas (Fig. 2). In 2010, a rapid decrease 
in crustacean biomass but a period of average 
ice access led to a depression in birth rates and 
a modest decrease in abundance but not a 
major mortality event. The onset of the 2019 
mortality event appears to have been driven 
initially by low crustacean biomass and exacer- 
bated by a steep reduction in access to feeding 
areas over the following 2 years. 

The decision to model gray whale popula- 
tion dynamics by applying annual covariate 
effects to carrying capacity (K), rather than 
the population’s intrinsic growth rate (7), is 
uncommon. Although in theory either model 
formulation could be used to explain fluctua- 
tions in abundance and vital rates, we believe 
that applying covariate effects to carrying ca- 
pacity better reflects biological reality. The 
Bering and Chukchi seas are the primary feed- 
ing area for virtually all eastern North Pacific 
gray whales, suggesting that the quality and 
quantity of prey in these areas will have a 
greater impact on vital rates when there is high 
intraspecific competition at higher levels of 
gray whale abundance. This is supported empiri- 
cally by our estimates of population growth rate 
relative to abundance. Mean population growth 
rates were significantly higher at low than at 
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high abundance levels, and major busts (annual 
declines of >9 to 10%) only occurred when the 
gray whale population was at high abundance 
(fig. S9), which supports the existence of density- 
dependent controls on vital rates. By applying 
covariate effects to carrying capacity, we simul- 
taneously account for environmental conditions 
and the effects of negative density depen- 
dence (37). In addition, this avoids a scenario in 
which, in a model that applies covariate effects 
to r instead of K, the population exceeds a sta- 
tionary carrying capacity but continues to grow 
because of positive covariate effects on growth 
rate. Instead, our estimated annual carrying 
capacity (K,) captures short-term fluctuations 
in the strength of density dependence and can 
be interpreted as an abstract parameter corres- 
ponding to the expected equilibrium abundance 
if environmental conditions remained fixed at 
the values recorded during that year (32). 
Over the past 50 years, the per capita bio- 
mass of benthic infaunal crustaceans has de- 
clined precipitously (Fig. 2D and fig. S3), and 
the three major gray whale mortality events 
coincided with periods of low per capita bio- 
mass, which translated into low total crusta- 
cean biomass. This decline in per capita biomass 
is most likely associated with species distri- 
bution shifts of benthic amphipods and other 
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crustaceans. As ice cover decreases in response 
to rapid Arctic warming, current speed in the 
Chirikov basin has increased, leading to larger 
sediment grain size and reduced particulate 
organic carbon reaching the seafloor (5). These 
conditions favor smaller amphipods with lower 
lipid content over the lipid-rich, tube-building 
ampeliscid amphipods that historically domi- 
nated the shallow basins of the Bering and 
Chukchi seas (10). This regime shift has likely 
contributed to declining per capita biomass of 
gray whale prey, which despite steady or in- 
creasing prey abundance has resulted in lower 
overall available biomass (fig. S3). 

The combined effect of sea ice cover and 
benthic productivity on gray whale population 
dynamics has driven major boom-bust cycles, 
including two modern booms in abundance 
that may have exceeded preexploitation levels 
(13). High benthic biomass and prey quality in 
the late 1970s and early 1980s supported al- 
most 25,000 gray whales, contributing to their 
delisting from the US Endangered Species Act. 
More recently, rapid Arctic warming in re- 
sponse to climate change increased access to 
feeding areas (Fig. 2B), supporting a sustained 
increase in gray whale abundance over the 
past decade (Fig. 1A). Although recent Arctic 
warming may have provided sufficient benefit 
to the population to counteract decreasing 
benthic biomass over the short term, the 
outlook for benthic prey quality is not favorable. 
Rising water column and bottom water tem- 
peratures and projected decoupling of pelagic 
and benthic productivity caused by retreating 
sea ice will likely lead to continued declines in 
Arctic benthic crustacean biomass (5). Access 
to feeding areas reached a peak of 266 days 
in 2019, which is presumably approaching a 
point of diminishing returns given that the spe- 
cies migrates to Mexico each winter. Poleward 
shifts in gray whale feeding locations have 
already been documented, which likely reflect 
the declining quality and shifting distribution 
of their preferred prey (72). Future declines in 
benthic biomass will likely drive decreases 
in gray whale carrying capacity that cannot be 
offset by continued increases in ice access. 
Reports of gray whales shifting their Arctic 
feeding distribution and targeting pelagic prey 
(12) suggest that they may have the ability to 
compensate for these changing conditions to 
some extent, but our results suggest that any 
ongoing behavioral adaptations have thus far 
been insufficient to prevent major mortality 
events. 

Eastern North Pacific gray whales are the 
most closely monitored large whale species, with 
records of abundance, reproduction, mortality, 
and condition spanning more than half a cen- 
tury. The abundance of most large whale species 
remains far below pre-whaling levels (33, 34), 
which limits our understanding of the dyna- 
mics and behavior of whale populations as 
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they approach carrying capacity and become 
increasingly governed by density-dependent 
processes. By contrast, gray whales have re- 
covered rapidly from post-whaling lows to num- 
bers that may approach or exceed pre-whaling 
levels and have low rates of direct human 
mortality, providing a rare window into the 
possible natural fluctuations of large whale 
populations. The periodic mortality events and 
major population swings that we report are 
surprising for a long-lived vertebrate that must 
by definition have high average survival rates 
to facilitate longevity. However, whales achieve 
their immense body sizes by feeding on large 
quantities of low-trophic-level prey (35), which 
may make them sensitive to oceanographic 
and environmental fluctuations. The feeding- 
fasting cycles associated with migratory baleen 
whales may also increase their susceptibility 
to environmental perturbations. Gray whales 
migrate more than 15,000 km each year and 
rely on a 4- to 5-month feeding season to sup- 
port a majority of their energetic requirements 
for the year. This strategy may place them at a 
physiological threshold at which disruptions 
to their food supply translate into major im- 
pacts to vital rates—a pattern that may be 
widespread across migratory whales and may 
become more pronounced as species and pop- 
ulations recover to their pre-whaling abun- 
dances. Climate-driven ocean warming is 
expected to have profound impacts on ocean 
circulation, upwelling strength, and primary 
production (36, 37), which may in turn have 
major implications for large whale population 
dynamics and viability through predator-prey 
interactions (34). 
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Probiotic-guided CAR-T cells for solid tumor targeting 


Rosa L. Vincent';, Candice R. Gurbatri'+, Fangda Li*, Ana Vardoshvili’, Courtney Coker’, Jongwon Im’, 
Edward R. Ballister’*, Mathieu Rouanne~*, Thomas Savage“, Kenia de los Santos-Alexis2, 


Andrew Redenti!?, Leonie Brockmann’, Meghna Komaranchath’, Nicholas Arpaia”?, Tal Danino 


1,3,4% 


A major challenge facing tumor-antigen targeting therapies such as chimeric antigen receptor (CAR)-T cells is 
the identification of suitable targets that are specifically and uniformly expressed on heterogeneous solid 
tumors. By contrast, certain species of bacteria selectively colonize immune-privileged tumor cores and can be 
engineered as antigen-independent platforms for therapeutic delivery. To bridge these approaches, we 
developed a platform of probiotic-guided CAR-T cells (ProCARs), in which tumor-colonizing probiotics 
release synthetic targets that label tumor tissue for CAR-mediated lysis in situ. This system demonstrated 
CAR-T cell activation and antigen-agnostic cell lysis that was safe and effective in multiple xenograft 
and syngeneic models of human and mouse cancers. We further engineered multifunctional probiotics that 
co-release chemokines to enhance CAR-T cell recruitment and therapeutic response. 


Ithough there has been marked success 
in the use of chimeric antigen receptor 
(CAR)-T cells for hematological malig- 
nancies, effective targeting of solid tu- 
mors has been limited. A key challenge 
of antigen-targeted cell therapies relates to 
the expression patterns of the antigen itself, 
which makes the identification of optimal tar- 
gets for solid tumor cell therapies an obstacle 
for the development of new CARs (J-3). Few 
tumor-associated antigens (TAAs) identified 
on solid tumors are tumor restricted, and thus, 
they carry a high risk of fatal on-target, off- 
tumor toxicity because of cross-reactivity against 
proteins found in vital tissues (4-6). Moreover, 
if a safe target can be identified, TAAs are often 
heterogeneously expressed, and selection pres- 
sures from targeted therapies ultimately lead to 
antigen-negative relapse (7, 8). Emerging strat- 
egies to address the antigen bottleneck have 
focused on improving CAR-T cells with ad- 
ditional genetic circuitry (9-11), modulatory 
proteins (12-17), or combinations with nano- 
particles and oncolytic viruses (18-21). 
Whereas CAR-T cells require considerable 
engineering to target and infiltrate solid tu- 
mors, bacteria can selectively colonize immune- 
privileged tumor cores and preferentially grow 
within hostile hypoxic and necrotic tumor 
microenvironments (TMEs) (22). Indeed, a 
multitude of patient studies have shown that 
different tumor types host different tumor 
microbiomes (23-25). To take advantage of 
these inherent properties, several groups have 
established an array of synthetic gene circuits 
to engineer a new class of prokaryotic cell ther- 
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apy (26, 27). These approaches have used en- 
gineered bacteria as intratumoral bioreactors 
that continually produce a range of payloads, 
which results in tumor regression and mitiga- 
tion of systemic side effects (28-31). Notably, 
clinical trials with engineered bacteria have 
reported minimal toxicities in patients with 
solid tumors, although these have yet to dem- 
onstrate considerable clinical efficacy across a 
broad range of indications (32-34). 

In this work, we bridge the complementar- 
ity of these two cell therapies in a platform of 
probiotic-guided CAR-T cells (ProCARs), whereby 
T cells are engineered to sense and respond to 
synthetic CAR targets that are released by tumor- 
colonizing, probiotic bacteria. This approach 
leverages the antigen independence of tumor- 
seeking microbes to create a combined cell ther- 
apy platform that broadens the scope of CAR-T 
cell therapy to include difficult-to-target tumors. 


Results 
Synthetic CAR targets “Tag” tumor cells for 
lysis by ProCAR-T cells 


To create a TAA-independent ProCAR platform, 
we engineered a well-characterized probiotic 
strain, Escherichia coli Nissle 1917 (EcN), which 
is equipped with a genomically integrated, syn- 
chronized lysis circuit (SLIC) for quorum- 
regulated delivery of synthetic CAR targets directly 
to the tumor core. With this system, bacterial 
growth reaches a critical population density ex- 
clusively within the niche of the solid TME and 
subsequently triggers lysis events that cyclically re- 
lease genetically encoded payloads in situ (30, 35). 

First, we designed extracellular proteins that 
can “tag” components of the TME for CAR- 
mediated lysis. Specifically, we fused a homodimer 
of superfolder green fluorescent protein (sfGFP) 
(diGFP)—previously shown to mediate CAR-T 
cell responses to soluble, dimeric ligands (36)— 
to the heparin-binding domain (HBD) of pla- 
centa growth factor-2 (PIGF-2493 444) (37). No- 
tably, the HBD from PIGF-2 broadly anchors 
to collagens, fibronectins, and heparan sul- 
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high abundance on most solid tumors ——— 
within the dense extracellular matrix (ECM) of 
the tumor stroma (38). We hypothesized that 
heparin-binding CAR targets (“Tags”) would 
benefit the ProCAR system twofold: (i) by 
limiting diffusion beyond ECM-dense tumor 
margins, which thereby enhances the safety 
of the system, and (ii) by facilitating CAR po- 
larization akin to conventional CAR and T cell 
receptor signaling (39), which thus results in 
greater antitumor activity (Fig. 1A). 

To achieve stable protein expression in vivo, 
we expressed the CAR targets under a constitu- 
tive tac promoter from an Axe/Txe stabilized 
plasmid (40) (fig. S1, A and B) and confirmed 
efficient collagen binding of Tags after His-tag- 
mediated protein purification (fig. S2, A to 
C). Next, we composed a GFP-specific CAR 
from an sfGFP-binding nanobody sequence 
(41) linked to CD28 and CD3¢ intracellular sig- 
naling domains by a short immunoglobulin G4 
(IgG4:) hinge for coexpression with an mScarlet 
reporter. With this, we were able to monitor 
the transduction efficiency of human T cells by 
mScarlet expression and confirm surface ex- 
pression through CAR receptor binding to 
purified, monomeric sfGFP (Fig. 1B). 

We first sought to measure CAR-T cell activ- 
ation in the presence of each purified sfGFP 
variant in comparison to MDA-MB-468 cells 
engineered to express a membrane-bound form of 
sfGFP (“mbGFP”) (fig. S1, C to E). Here, we found 
that GFP CAR-T cells (GFP28z) comparatively 
activated in response to collagen-bound Tags, 
moderately activated in response to soluble 
diGFP, and remained unchanged by exposure 
to monomeric sfGFP (Fig. 1C). Quantification 
of CD69 surface expression demonstrated dose- 
dependent activation, with increased levels ob- 
served in response to collagen-bound Tags (figs. 
S2D and S3A). We additionally evaluated intra- 
cellular levels of T helper cell 1 (T};1) proinflam- 
matory cytokines, for which GFP28z displayed 
higher frequencies of polyfunctional CD8* T 
cells that produce interferon-y (IFN-y) and tu- 
mor necrosis factor-a (TNFa) in response to 
either collagen-bound Tags or mbGFP (fig. S3, 
B to D). Both diGFP and collagen-bound Tag 
yielded similar fold expansion after a single 
exposure to either target (fig. S3E). 

Although previous studies did not observe 
target cell lysis in response to soluble ligands 
(36), we hypothesized that Tags may facilitate 
the lysis of cancer cells by binding to membrane- 
spanning matrix proteins (42). To assess cell 
surface interaction, we incubated untransduced 
(UT) T cells, triple negative breast cancer (TNBC), 
and ovarian cancer cell lines with diGFP or 
Tag variants and demonstrated that Tags 
robustly coat the surface of both cancer cell 
lines but do not bind the surface of rested or 
activated T cells (Fig. 1D and fig. S4A). More- 
over, we did not observe significant surface 
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Fig. 1. Synthetic CAR targets “Tag” tumor cells for lysis by ProCAR-T cells 
in situ. (A) Schematic demonstrating the ProCAR platform, in which 

synthetic CAR targets (Tags) are produced and released in situ by tumor- 
colonizing probiotic bacteria (E. coli Nissle 1917) to label ubiquitous components 
of solid tumors for de novo lysis by GFP-CARs (GFP28z). Tags are designed 

as dimers of sfGFP fused to an HBD (PIGFj23-144) that broadly bind to cell 
surface and matrix proteins found in the TME. (B) Representative flow cytometry 
histograms demonstrating GFP28z surface expression through binding purified 
SfGFP (left) and coexpression of mScarlet (right) in primary human T cells. 

(C) Flow cytometric quantification of CD25 surface expression after exposure to 
100 ng/ml of GFP-based CAR targets for 16 hours—monomeric sfGFP (GFP), 
soluble diGFP, collagen-bound GFP (Tag)—shown relative to MDA-MB-468 

cells expressing mbGFP at a 1:1 ratio. Data represent mean + SD of n = 3 
biological replicates. MFI, mean fluorescence intensity. (D) Representative flow 
cytometry histograms of surface-bound GFP after incubation with 500 ng/ml 
Tag-GFP or diGFP for 20 min. (E) Confocal microscopy images of Jurkats 
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expressing GFP28z-msScarlet fusion receptors for subcellular visualization. CARs 
are shown in orange and cocultured with unlabeled MDA-MB-468 target cells; 
images were acquired every 2 to 5 min after addition of 100 ng/ml of purified Tag. 
White arrows indicate CAR clusters. (F) Overnight killing assay against ffLuc* 
HEK293T target cells at defined E:T ratios. CAR-T cells were cocultured with target 
cells + 100 ng/ml of CAR targets for 20 hours. Specific lysis (Y%) was determined 
by normalizing relative light units (RLU) to cocultures with UT T cells. Data represent 
mean + SD of n = 3 biological replicates. (G) Overnight killing assay of ffLuc* 
HEK293T target cells at a 3:1 E:T with half log dilutions of purified Tag. Data 
represent mean + SD of n = 3 biological replicates. (H) Overnight killing assays 
against a panel of ffLuc* target cells at a fixed E:T ratio (3:1) and treated as in (F). 
(I) Overnight killing assay of ffluc” MDA-MB-468 in the presence of 20 ng/ml 
human HSPE (hHSPE) at a 1:1 E:T ratio, + 100 ng/ml Tag. Data represent mean + SD 
of n = 3 biological replicates. ***P < 0.001; ****P < 0.0001; two-way analysis of 
variance (ANOVA) [(C), (F), (H), and (I)] or one-way ANOVA (G), Holm-Sidak 
multiple comparison correction. ns, not significant; 468, MDA-MB-468. 
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binding of additional mouse lymphocyte and 
myeloid immune effector cells relative to mouse 
colorectal cancer cells (fig. S4B). We then hy- 
pothesized that surface-bound Tags may indeed 
facilitate synapse formation between CARs and 
the target cell; thus, we engineered an addition- 
al GFP28z receptor with a C-terminal mScarlet 
fusion to track CAR receptor localization by 
confocal microscopy. Notably, by 30 min after 
addition of purified Tag, GFP28z-mScarlet re- 
ceptors appeared to cluster at the interface 
between each T cell and target cell, whereas 
T cells supplied with soluble diGFP and un- 


treated cells remained unchanged (Fig. 1E and 
fig. S5, A and B). 

In line with these findings, we observed mini- 
mal cytotoxic activity of GFP28z in response to 
phosphate-buffered saline (PBS) or diGFP in 
killing assays of ffLuc* human embryonic kid- 
ney (HEK) 293T cells at multiple effector-to- 
target (E:T) ratios. Conversely, GFP28z cells 
provided with Tags were able to drive signif- 
icant levels of target cell lysis at high and low 
E:T ratios (Fig. 1F). We confirmed that this is a 
dose-dependent response, whereas cytotoxicity 
was still observable at Tag concentrations as 


low as 1.5 ng/ml (Fig. 1G). Moreover, this combi- 
nation achieved significant lysis across a panel of 
seven genetically distinct human cancer cell 
lines (Fig. 1H) while demonstrating no added 
effect on conventional CD19 (1928z)- or intercel- 
lular adhesion molecule-1 (ICAM-1) ICAM28z)- 
directed CAR-T cells (fig. S6, A and B). We next 
measured the surface expression of CD107a—a 
membrane-bound molecule commonly used as 
a proxy for cytotoxic degranulation—and found 
higher expression on GFP28z incubated with 
collagen-bound Tags than cells exposed to 
diGFP (fig. S6C). Correspondingly, we detected 
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Fig. 2. ProCAR-T cells yield antigen-agnostic therapeutic efficacy in 
response to Tags and bacterial adjuvants provided by probiotics in situ. 
(A to C) Nalm6 cells (5 x 10°) were implanted subcutaneously (‘'s.c.”) into the 
hind flank of NSG mice. When tumor volumes reached ~100 mm®, mice were 
intratumorally injected with 1 x 10° CFU of engineered probiotic strains (Pro*) 
producing diGFP (Pro“¢'?) or Tag (Pro'®8) targets or an empty control (Pro’) (A). 
Then, 2.5 x 10° GFP28z* ProCAR-T cells were delivered 48 hours post bacteria 
treatment (pbt), with tumor growth monitored by caliper measurements every 
3 to 4 days. Mean tumor trajectories (B) and survival curves (C) are shown. Data 
represent mean + SEM of n > 4 biological replicates. (D) ELISA quantification of 
SfGFP levels from tumor homogenates (left) and serum (right) on day 14 pbt; data 
represent SEM of n = 3 biological replicates. (E and F) MDA-MB-468 cells (5 x 10°) 
were subcutaneously implanted into the hind flank of NSG mice. When tumors 
reached palpable volume, mice were intratumorally injected with 1 x 10° CFU of 


Vincent et al., Science 382, 211-218 (2023) 13 October 2023 


PBS Pro- Pro!@9 PBS Pro™ Pro!#9 PBS Pro™ Pro'@9 


Pro’ or control Pro” strains or PBS. On days 2 and 15 pbt, tumors were treated 
with 2.5 x 10° GFP28z* ProCAR-T cells, and tumor growth was measured as in (A). 
Mean tumor trajectories are shown (F). Data represent mean + SEM of n > 3 
biological replicates. (G to K) Nalm6 tumors were established and treated as in (A) 
and resected on day 2 after T cell treatment (day 4 pbt) for analysis by flow 
cytometry. (H) Frequency of IT hCD45*CD3*CD8" T cell memory and effector 
populations determined by CD62L and CD45RO expression patterns. Data represent 
mean + SEM of n > 3 biological replicates. (1) Flow cytometric quantification 

of CD69 surface expression on IT hCD45*CD3* CD8" cells in each treatment group. 
Data represent mean + SEM of n = 3 biological replicates. (J and K) Luminex 
quantification of IT IFN-y (J) and TNF-a (K) concentrations. Data represent 
mean + SD of biological replicates. *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 
0.0001; two-way ANOVA [(B), (F), and (H)], log-rank test (C), or one-way ANOVA 
[(D), (1) to (K)], with Holm—Sidak multiple comparison correction. 
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Fig. 3. Treatment with the ProCAR platform provides a systemic therapeutic 
benefit in an immune-competent model of colorectal cancer. (A) C726 cells 
(5 x 10°) were implanted subcutaneously into the hind flank of BALB/c mice. 
When tumor volumes reached ~100 mm®, mice were treated with 5 x 10° CFU of 
engineered probiotic strains (Pro*) producing Tag (Pro'@®) or an empty control 
(Pro’). On days 2 and 5 pbt, tumors were treated with 2.5 x 10° mGFP28z* 

T cells, and growth was monitored by caliper measurements every 3 to 4 days. 
Mean tumor trajectories are shown. Data represent mean + SEM of n > 10 
biological replicates. (B) A20 cells (5 x 10°) were implanted subcutaneously into 
the hind flank of BALB/c mice. When tumor volumes reached 200 to 300 mm?, 
mice were treated with probiotics as in (A). On day 2 pbt, tumors were treated 
with 1 x 10° mGFP28z* T cells. Mean tumor trajectories are shown. Data 
represent mean + SEM of n > 3 biological replicates. (© to E) MC38 cells (5 x 
10°) were implanted subcutaneously into both hind flanks of C57BL/6 mice. 
When tumor volumes reached ~150 mm‘, the left tumor received a single 
injection of 2 x 10° CFU Pro'8, Pro’, or a PBS control (C). On days 2 and 5 pbt, 
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tumors on the left flank were treated with 1.5 x 10° mGFP28z* T cells. Tumors on 
the right flank were left untreated. Mean tumor trajectories of the treated 
tumors (D) and untreated tumors (E) are shown. Data represent mean + SEM of 
n > 4 biological replicates. (F to J) C5/BL/6 mice were grafted and treated 

as in (C). On day 9 pbt, treated tumors and tumor-draining lymph nodes were 
retrieved for analysis by flow cytometry. (F) Frequency of IT CD69°CAR* T cells; 
representative flow cytometry histograms (left) and quantification are shown 
(right). Data represent mean + SEM of n > 3 biological replicates. (G) Frequencies of 
CD69° tumor infiltrating CD8*" and CD4*Foxp3” (Teony.) CAR” T cells. Data 
represent mean + SEM of n > 3 biological replicates. (H) Frequency of Ki-67* 
tumor-infiltrating CAR CD8* T cells from each treatment group. (I and J) Frequency 
of activated (CD40°MHCII") (I) and PD-L1* (J) proinflammatory monocytes 
(CD11b*Ly6C*) in the lymph nodes of treated and control mice. Data represent 
mean + SEM of n > 3 biological replicates. *P < 0.05; **P < 0.01; ****P < 0.0001; 
two-way ANOVA [(A), (B), (D), (E), and (G)] or one-way ANOVA [(F), (H), (1), 

and (J)]; ANOVAs performed with Holm—Sidak multiple comparison correction. 
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higher levels of perforin and granzymes A and 
B in the supernatant of these cells, which to- 
gether indicate direct mechanisms of cytotox- 
icity (fig. S6D). 

We then sought to assess the importance of 
general matrix and adhesion molecules to the 
observed phenotype by incubating cancer cell 
lines with Tags after trypsinization. As antici- 
pated, we observed a significant reduction in 
the level of surface-bound-GFP after trypsin- 
mediated cleavage of adhesion proteins (fig. 
S6E). Thus, we next considered the specific 
contribution of cell surface heparan sulfates 
(HSs) and HSPGs as a mechanism of direct cell 
interaction. Indeed, incubation with heparinase 
(HSPE) and cleavage of HS chains significant- 
ly disrupted the lysis of MDA-MB-468 cells by 
GFP28z, whereas the cytotoxic activity of 1928z 
was not inhibited by HSPE in a killing assay 
of Nalm6 cells (Fig. 11 and fig. S6F). Taken to- 
gether, these results suggest that the Tag design 
confers broad utility that facilitates antigen- 
agnostic CAR activity. 


ProCAR system mediates antigen-agnostic 
therapeutic efficacy in human xenograft models 


Motivated by the cytotoxicity observed in vitro, 
we sought to characterize the full effects of 
the ProCAR platform in vivo, first by using non- 
obese diabetic scid gamma (NSG) mice bearing 
subcutaneous Nalm6 tumors. We have previous- 
ly shown that EcN-SLIC (Pro*) strains can exclu- 
sively grow to a critical population density within 
the tumor core, synchronously lyse, and cyclically 
release therapeutic payloads every 48 to 72 hours 
after a single intratumoral (IT) injection (35). 
Therefore, we chose to treat tumors with a single 
IT injection of 1 x 10° colony-forming units (CFU) 
of Pro* strains either producing diGFP (Pro“@") 
or Tag (Pro!*?) targets, or an empty control (Pro ), 
48 hours before IT delivery of 2.5 x 10° GFP28z* 
cells (Fig. 2A). Here, GFP28z in combination 
with Pro“¢¥? demonstrated no appreciable 
therapeutic efficacy and did not slow tumor 
growth compared with tumors treated with 
control Pro” strains alone (Fig. 2B and fig. S7A). 
By contrast, Pro‘ strains mediated potent anti- 
tumor responses, which achieved significantly 
slowed tumor growth and correspondingly im- 
proved survival (Fig. 2C). 

We monitored the body weight of mice as a 
proxy for mouse health from the start of bacte- 
ria treatment and observed no significant weight 
loss in mice treated with Pro” alone, GFP28z 
alone, or any combination of the two cell ther- 
apies (fig. S7B). Additionally, we did not detect 
bacterial growth outside of tumor homoge- 
nates on day 3 and 14 post-bacteria treatment 
(pbt) (fig. S7C) and found that Tag-expression 
plasmids were well maintained in vivo (fig. 
S7D). To assess the tumor retention of pro- 
biotically delivered CAR targets, we quantified 
the level of GFP in tumor homogenates and 
matched serum samples from mice treated 
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diGFP Tag 


with Pro , Pro , and Pro “° strains through 
GFP-specific enzyme-linked immunosorbent 
assay (ELISA). We detected comparable levels 
of both CAR targets in tumor homogenates 
across Pro“¢!? and Pro!“ groups (Fig. 2D, left), 
which suggests that the observable differences 
in efficacy were the result not of unequal tar- 
get abundance but rather of the ability of Tags 
to promote target cell killing when soluble tar- 
gets do not. Furthermore, higher concentra- 
tions of sfGFP were detected in the serum of 
mice treated with Pro“"?, which suggests that 
Tags reduce leakage into systemic circulation 
(Fig. 2D, right). To assess off-tumor Tag abun- 
dance directly, we quantified GFP concentra- 
tion in tumor, lung, kidney, spleen, and liver 
homogenates from mice bearing subcutane- 
ous HCT116 tumors and did not detect appre- 
ciable levels of GFP or bacterial growth in any 
healthy organ 14 days pbt (fig. S8, A to D). 
We next considered the antigen-agnostic ca- 
pability of the ProCAR platform and assessed 
therapeutic efficacy in a mouse model of hu- 
man TNBC. Notably, TNBC tumors lack estro- 
gen and progesterone receptors as well as HER2 
expression, which makes them unresponsive 
to available targeted therapies and an oppor- 
tunity for new treatment options (43). For ini- 
tial assessment, mice bearing subcutaneous 
MDA-MB-468 tumors were treated with an IT 
injection of Pro* strains or PBS 2 days before 
receiving an IT injection of PBS, GFP28z, or con- 
trol ICAM28z CAR-T cells. ProCARs respond- 
ing to Pro’? demonstrated enhanced antitumor 
efficacy relative to ICAM28z despite high ICAM-1 
surface expression (fig. S9, A to C) (44), likely 
owing to the presence of potent Toll-like receptor 
(TLR) agonists inherent to the ProCAR system. 
By day 28 after treatment, we observed signs 
of T cell dysfunction across all treatment groups 
(fig. SOE) despite the preserved bacterial lysis 
behavior and GFP production observed in Pro‘“2 
isolates (fig. SOF). Therefore, to mitigate T cell 
exhaustion, we increased the frequency of T cell 
treatments to two doses, spaced 2 weeks apart 
while keeping to a single dose of probiotics. After 
this, the ProCAR system achieved durable anti- 
tumor efficacy, with no tumor growth observed 
for 70 days after engraftment (Fig. 2, E and F; fig. 
S10A). Moreover, treatment of an additional 
subcutaneous SKOV3 tumor model with Pro- 
CAR-T cells in combination with Pro! signifi- 
cantly slowed tumor growth (fig. S10, B and C). 
Although we did not observe nonspecific ef- 
fects from the presence of Pro!*® strains on UT 
T cells (fig. S10, D and E), we consistently noted 
moderate and model-independent activity of 
ProCAR-T cells in combination with control Pro- 
strains. Notably, activated T cells up-regulate 
TLR4 and TLR5 expression (45, 46), of which 
lipopolysaccharides (LPSs) and flagellin found 
on EcN are the respective agonists. Therefore, 
we hypothesized that IT bacteria may serve as 
an adjuvant to enhance ProCAR-T cell activity 
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(fig. SILA). To test this in vitro, we monitored 
the surface expression of CD69, CD25, and 
CD107a on GFP28z cells exposed to Pro lysate 
or Tags separately, or the combination of both 
stimuli. GFP28z demonstrated significantly in- 
creased levels of all three markers in response 
to lysate alone, which indicates TLR stimulation, 
whereas the combination of lysate and collagen- 
bound Tags stimulated the highest expression 
(fig. SI1B). This combination effect was also 
mirrored in the levels of proinflammatory cyto- 
kines detected in cell culture supernatants and 
in the enrichment of T.¢ populations, which 
demonstrates a strong shift toward cytotoxic 
effector functions (fig. S11, C and D). 

To study the effects of CAR targets produced 
by live bacteria, we investigated the phenotype 
of GFP28z isolated from Nalm6 tumors on 
day 4 after treatment with PBS, Pro’, or Pro!® 
(Fig. 2G). CD8* GFP28z T cells from Pro!?2- 
treated tumors were again significantly enriched 
for differentiated T.¢ populations, whereas cells 
from Pro -treated tumors displayed a more 
modest trend toward differentiation relative to 
PBS-treated, or even purified Tag-treated, con- 
trols (Fig. 2H and fig. S12A). As a measure of 
activation, GFP28z cells displayed significantly 
increased CD69 and CD25 expression in re- 
sponse to Pro” and Pro?’ strains (Fig. 21 and 
fig. S12B). Correspondingly, tumors treated with 
either strain were found to contain significantly 
increased levels of human proinflammatory 
cytokines (Fig. 2, J and K, and fig. S12C), whereas 
the early exhaustion phenotype of GFP28z was 
found to be inversely correlated with exposure 
to bacteria, which demonstrates the potential 
benefit of MyD88 signaling after TLR activa- 
tion (47) (fig. S12D). Together, these observations 
highlight the use of tumor-colonizing bacteria 
to mediate antigen-agnostic CAR-T cell activity 
while simultaneously boosting T cell effector 
functions through natural TLR stimulation. 


Treatment with the ProCAR platform 
provides a systemic therapeutic benefit in an 
immune-replete model of colorectal carcinoma 


Although the use of immunocompromised mice 
allows for the study of human T cell behavior 
in a bacterial platform, we next aimed to un- 
derstand the performance of the ProCAR plat- 
form in the context of a functional immune 
system and comprehensive TME. To do this, 
we generated a GFP CAR using the same sfGFP- 
specific nanobody fused to the signaling com- 
ponents of mouse CD28 and CD3z (mGFP28z) 
for expression in mouse T cells and confirmed 
efficient killing of mbGFP-MC38 target cells 
(fig. S13, A and B). Consistent with our obser- 
vations of human GFP28z, mouse CAR-T cells 
were strongly activated by collagen-bound Tag 
(fig. S13, C and D) and were able to drive the 
lysis of mouse TNBC and colorectal cancer cell 
lines through direct interaction with Tag and 
the target cell surface (fig. S13, E and F). 
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Fig. 4. Multifunctional probiotics produce combinations of TME-modulating 
factors to facilitate systemic delivery and delay the growth of orthotopic 
breast tumors. (A) Probiotics are engineered to release an activating 

mutant of the human chemokine, CXCL16%*“* (“CXCL16"), to recruit CXCR6* 
ProCAR-T cells directly to the tumor site. (B) MDA-MB-468 cells (5 x 10°) 
were subcutaneously implanted into the hind flank of NSG mice. Palpable tumors 
were then either injected with 1 x 10° CFU of a strain engineered to produce 
both CXCL16 and Tag in combination (Pro®°™’°), control strains producing 
single agents (Pro'@®", Pro®*°'"®), or a PBS control. On days 2 and 15 pbt, mice 
were intravenously treated with 6 x 10° GFP28z* T cells. Mean tumor growth 
trajectories are shown. Data represent mean + SEM of n > 3 biological 
replicates per group. (C) Counts of infiltrating hCD45*CD3* cells per milligram 
of tumor. MDA-MB-468 tumors were established and treated as in (A). On 
day 7 after treatment, tumors from Pro!” and Pro®°™° treatment groups 
were retrieved and homogenized for analysis by flow cytometry. Data represent 
mean + SEM of n = 4 biological replicates per group. (D and E) MDA-MB-468 
tumors were established and measured as in (B). When tumors reached 


palpable size, mice were intravenously treated with 5 x 10° CFU of Pro* 
strains or PBS. On days 3 and 17 pbt, mice were intravenously treated with 
6 x 10° GFP28z* ProCAR-T cells. Mean tumor trajectories are shown (E). 
Data represent mean + SEM of n > 5 biological replicates per group. I.V., 
intravenously. (F to 1) An orthotopic model of TNBC was established through 
the surgical implantation of 5 x 10° MDA-MB-468 cells into the mammary 
fat pad (“m.f.p.”) of female NSG mice. When tumor volumes reached 

~100 mm?, mice were treated as in (D); mean tumor growth trajectories are 
shown (G). Data represent mean + SEM of n > 6 biological replicates per 
group. (H and |) Biodistribution assessment of Pro” (H) and sfGFP (I) in 
tumor, lung, kidney, spleen, and liver homogenates. On day 42 pbt, tumor 
and matched healthy tissue were digested and plated with the appropriate 
antibiotics for colony quantification or assessed by ELISA for sfGFP 
concentration. Data represent mean + SEM of n = 5 (H) or n = 3 (1) biological 
replicates per group. Limit of detection (LOD), 200 CFU/g. *P < 0.05; **P < 
0.01; ****P < 0.0001; two-way or one-way (C) ANOVA, with Holm-Sidak 
multiple comparison correction. 


With the mouse components in hand, we ini- 
tially sought to test the platform in immune- 
competent BALB/c mice bearing subcutaneous 
CT26 colorectal carcinomas. We dosed tumors 
directly with 2 x 10° CFU of Pro” or Pro!*? alone 
or followed by two doses of autologous 2.5 x 
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10° mGFP28z" T cells spaced 6 days apart. No- 
tably, treatment with Pro‘? alone was not suf- 
ficient to generate an appreciable antitumor 
response without the co-delivery of mGFP28z, 
whereas the combination significantly slowed 
tumor progression (fig. S14, A and B). Moreover, 
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decreasing the time between T cell treatments 
to 3 days apart yielded significant antitumor 
efficacy that led to occurrences of tumor regres- 
sion (Complete response CR = 3/13) and im- 
proved survival (Fig. 3A and fig. S14, C and 
D), whereas treatment with a single low dose 
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of mGFP28z (1 x 10°) in combination with 
Pro‘? was able to control the growth of large 
A20 lymphomas (Fig. 3B and fig. S14E). 

In these studies, we chose to omit the com- 
mon practice of preconditioning lymphode- 
pletion (48, 49) to observe any potential effects 
on the endogenous immune system. Thus, we 
aimed to investigate whether IT injection of 
the ProCAR system could generate signs of 
systemic antitumor immunity in a dual-flank 
MC328 colorectal carcinoma model established 
in C57BL/6 mice (Fig. 3C). Notably, we observed 
that unilateral treatment of one tumor not only 
resulted in instances of tumor regression on 
the treated side (CR = 2/7) but led to a signif- 
icant reduction in the growth rate of distal, 
untreated tumors (Fig. 3, D and E, and fig. 
S14F). In contrast to the human system, we did 
not observe an appreciable effect of mGFP28z 
responding to control Pro’ on either side, which 
suggests that mouse T cells may be less sensi- 
tive to TLR agonism. 

To understand the underlying immune re- 
sponse, we next investigated the immuno- 
phenotype of cells isolated from the tumors 
and tumor-draining lymph nodes of treated 
and control mice (fig. SI5A). Analogous to the 
human system, mGFP28z* T cells displayed 
the highest activation in response to Pro‘2 
treatment (Fig. 3F). As a measure of bystander 
T cell activation, we observed significantly in- 
creased CD69 expression on tumor-infiltrating 
CD8* and conventional CD4*Foxp3” (Teony.) 
T cells in the CAR-negative CD3* fraction of 
Pro'*®-treated tumors (Fig. 3G). We also noted 
probiotic-related increases in the frequencies 
of Ki67" and CD44" tumor infiltrating T.onv. 
cells (Fig. 3H and fig. S15B), whereas analysis 
of the IT myeloid populations revealed increased 
frequencies of activated (CD40°MHCII') mono- 
cytes (fig. S15C). Notably, we observed signif- 
icantly expanded populations of activated and 
PD-L1* monocytes in the tumor-draining lymph 
nodes of treated mice (Fig. 3, I and J, and fig. 
S15D). Together, these data support the hypoth- 
esis that the cooperation between IT probiot- 
ics and activated CAR-T cells can propagate 
an adaptive response from the endogenous im- 
mune system, which ultimately leads to a sys- 
temic antitumor benefit. 


Multifunctional probiotics demonstrate 
selective colonization of human TNBC and 
yield therapeutic efficacy in orthotopic breast 
tumor models 


To advance the technology in an immunocom- 
promised host, we pursued systemic delivery 
of human ProCAR-T cells by intravenous in- 
jection. Conventionally, CAR-T cell trafficking 
to solid tumors is often impeded by mismatched 
chemokine and receptor expression (50). More- 
over, given the cyclical release of Tags relative 
to the constitutive surface expression of tradi- 
tional TAAs, we hypothesized that circulating 
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GFP28z cells may have increased difficulty lo- 
cating their target. 

Thus, we equipped an additional strain with 
the mechanism to co-release an activating mu- 
tant of human CXCLI16 (CXCLI6"*"“, in which 
lysine 42 is substituted with alanine) (57) and 
directly recruit ProCAR-T cells in circulation 
(Fig. 4A). Notably, CXCLI16 is reported to in- 
teract exclusively with its cognate receptor, 
CXCR6, expressed on effector memory T cells 
(52). For this, we expressed CXCLI6**“ and 
Tag genes from a single Axe/Txe stabilized vec- 
tor under separate pTac and J23100 promoters 
(Pro°™°), respectively, and generated matched 
single control strains (Pro™*“"* and Pro!2’) 
with equivalent payload expression (fig. S16, A 
to C). As anticipated, treatment of subcutane- 
ous TNBC tumors with the Pro®™”? strain, 
followed by intravenous infusion of ProCAR- 
T cells, reduced tumor growth without affect- 
ing mouse body weight (Fig. 4B and fig. S16, D 
and E). Pro" also yielded an appreciable 
therapeutic benefit in this system, likely owing to 
the increased T cell infiltration and response 
to probiotic TLR agonists. Analysis of tumor 
homogenates revealed high and comparable 
payload expression and significantly increased 
hCD45°CD3* T cell counts in tumors treated with 
Pro©™” relative to tumors treated with Pro?” 
alone (Fig. 4C and fig. S16, F and G), which to- 
gether suggest that chemokine support is nec- 
essary for T cell recruitment and therapeutic 
efficacy in inaccessible subcutaneous tumors. 

We then sought to assess the feasibility 
of delivering the complete ProCAR platform 
by systemic injection. We carefully monitored 
mouse health during a dose escalation study 
of intravenously delivered Pro strains and 
noted that initial weight loss in response to 
higher CFUs was recovered by day 7 after 
treatment and that systemic delivery did not 
lead to bacterial growth in healthy organs de- 
spite achieving 100% tumor-colonization effi- 
ciency by 48 hours after infusion (fig. S17, A 
to E). With this information, we treated mice 
bearing subcutaneous TNBC tumors with a 
single intravenous dose of Pro’? or Pro"? 
strains at 5 x 10° CFU, followed 72 hours later 
with the first infusion of ProCAR-T cells (Fig. 4, 
D and E). Again, tumors colonized by Pro®°™”° 
strains demonstrated significantly slowed 
growth relative to PBS and Pro’? treatment 
groups without causing discernable signs of 
off-tumor toxicity (fig. S17, F to I). 

Ultimately, we established an orthotopic 
model of TNBC through the surgical implan- 
tation of MDA-MB-468 cells into the mam- 
mary fat pad (MFP) of female NSG mice (Fig. 
4F). In this model, intravenous infusion of 
the ProCAR system led to appreciable ther- 
apeutic efficacy in both Pro? and Pro©o™”° 
treatment groups, although Pro@™”? provided 
enhanced therapeutic benefit (Fig. 4G and 
fig. S18, A and B). Moreover, we observed sim- 
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ilar probiotic colonization levels restricted to 
MFP tumors (Fig. 4H) and minimal off-tumor 
sfGFP expression in healthy organs (Fig. 41). 
To specifically study the tissue distribution of 
Pro™ and Tag within the MFP, we retrieved tu- 
mors and surrounding healthy tissue for immu- 
nohistochemical staining for E. colt and GFP 
(fig. S19). E. colt staining of tumor sections from 
Pro” and Pro!*®’ treatment groups revealed 
probiotic localization to the tumor core, with 
correspondingly strong GFP detection in the 
core of Pro'*®’ treated tumors. Moreover, he- 
matoxylin and eosin sections of healthy or- 
gans taken from treated and untreated groups 
did not show evidence of tissue damage (fig. 
S20A). In addition, intravenous treatment in 
an immune-replete context did not yield signs 
of inflammatory organ damage in the serum of 
female Friend Virus B NIH Jackson FVB 
mice bearing syngeneic mammary specific 
polyomavirus middle T antigen overexpres- 
sion mouse model (MMTV-PyMT) MFP tumors 
(fig. S20, B to D). Collectively, these mouse 
model data demonstrate the use of engi- 
neered probiotics to selectively grow within 
the TME niche and safely release combinations 
of CAR-T cell enhancing payloads in situ. 


Concluding remarks 


We have demonstrated an approach to engi- 
neering interactions between living therapies, 
in which tumor-colonizing probiotics have been 
repurposed to guide the cytotoxicity of engi- 
neered T cells. We have shown that by fusing 
synthetic CAR targets to an HBD, we can achieve 
antigen-agnostic cell death, and by harnessing 
the tumor-restricted growth of E. coli, we can 
release these targets directly within the TME 
to achieve efficacy in genetically distinct mouse 
models of human and mouse cancer. These 
findings highlight the potential of the ProCAR 
platform to address the roadblock of identify- 
ing suitable CAR targets by providing an an- 
tigen that is orthogonal to both healthy tissue 
and tumor genetics. 

Notably, even the gold standard CD19 CAR 
faces antigen-dependent issues, the loss of which 
has become a frequent cause of patient relapse 
(7), and off-tumor expression on brain mural 
cells has been linked to reports of dangerous 
neurotoxicity (5). The antigen bottleneck ap- 
pears to have a greater impact upon treatment 
of solid cancers, and approaches to overcome 
these issues have primarily relied on incorporat- 
ing complex genetic circuitry to afford greater 
control over TAA recognition. Strategies tar- 
geting more than one antigen can circumvent 
issues of tumor escape (53, 54), yet the chal- 
lenge of finding a single suitable target can 
limit this approach for most solid tumors. 

Tumor pattern recognition (13) and Boolean- 
gated logic circuits (11) provide elegant solu- 
tions to the issue of off-tumor toxicity; however, 
they involve complex T cell engineering and 
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insertion of multiple transgenes. Other ap- 
proaches to build universal CARs, in which the 
antigen recognition domain is provided by sep- 
arate intravenous infusion, enables the ex- 
change of antigen specificity during treatment 
(12). Additionally, CARs secreting bispecific T cell 
engagers (BiTEs) against broadly expressed 
targets have been effective in preclinical mod- 
els of heterogeneous tumors (/4). Although 
these strategies represent considerable ad- 
vancements, such approaches rely on conven- 
tional target identification and tumor genetics. 
Thus, several groups have looked to nanopar- 
ticles, oncolytic viruses, and IT injection as al- 
ternative platforms to deliver CAR-T cell targets 
to solid tumors (17-21). 

Here, the use of bacteria in the ProCAR sys- 
tem offers a partner organism that facilitates 
tumor-specific target delivery while concur- 
rently providing natural inflammatory prop- 
erties that serve to enhance the antitumor 
response. Our study in xenograft tumor mod- 
els revealed a greater sensitivity of human 
T cells toward bacterial adjuvants, which dem- 
onstrates consistent antitumor benefits in re- 
sponse to control Pro’ strains. By contrast, 
study of the ProCAR system in syngeneic tumor 
models demonstrated that a bacteria-based 
immunotherapy may be sufficient to prime 
systemic antitumor immunity in treated tumors, 
which can subsequently direct responses against 
uncolonized tumors or “untagged” tumor areas. 
Nonetheless, together these studies have dem- 
onstrated the tumor-specific growth of engi- 
neered probiotics in both immune-replete and 
immunocompromised mice, without the gen- 
eration of apparent systemic infection or in- 
flammatory damage. 

Although not assessed here, humans are 
more sensitive to endotoxins than mice, and 
an important concern for clinical translation 
will be the potential toxicity from systemic 
administration of a Gram-negative bacterial 
therapy (55). Thus, a critical approach for trans- 
lation will be to introduce genetic attenua- 
tions that have previously enabled both local 
and intravenous administration of bacterial 
therapies in clinical trials (33, 56). Structural 
modifications to LPSs have been shown to sub- 
stantially reduce TLR4 stimulation without dis- 
rupting bacteria viability and tumor colonization 
(57, 58). Such attenuations could be combined 
with additional circuits to further restrict bac- 
teria growth and reduce immunogenicity to 
facilitate safe systemic delivery and repeat dos- 
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ing (59, 60). Overall, combining the advantages 
of tumor-homing bacteria and CAR-T cells pro- 
vides a new strategy for tumor recognition 
and, in turn, builds the foundation for engi- 
neered communities of living therapies. 
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Direct observation of glycans bonded to proteins 
and lipids at the single-molecule level 


Kelvin Anggara’*, Laura Srsan’, Thapakorn Jaroentomeechai’, Xu Wu’, Stephan Rauschenbach*, 


Yoshiki Narimatsu*°, Henrik Clausen®, Thomas Ziegler2*, Rebecca L. Miller®*, Klaus Kern 
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Proteins and lipids decorated with glycans are found throughout biological entities, playing roles in 
biological functions and dysfunctions. Current analytical strategies for these glycan-decorated 
biomolecules, termed glycoconjugates, rely on ensemble-averaged methods that do not provide a full 
view of positions and structures of glycans attached at individual sites in a given molecule, especially for 
glycoproteins. We show single-molecule analysis of glycoconjugates by direct imaging of individual 
glycoconjugate molecules using low-temperature scanning tunneling microscopy. Intact glycoconjugate 
ions from electrospray are soft-landed on a surface for their direct single-molecule imaging. The 
submolecular imaging resolution corroborated by quantum mechanical modeling unveils whole 
structures and attachment sites of glycans in glycopeptides, glycolipids, N-glycoproteins, and 


O-glycoproteins densely decorated with glycans. 


lycan (also Known as carbohydrate) is 

one of the four essential organic build- 

ing blocks found in all forms of life 

(1-7). Glycans play Key roles in cellu- 

lar functions (7, 8), growth and devel- 
opment (2, 9), identification (2-4), shapes 
(10, 11), and energy storage (12). In bio- 
logical systems, glycans are predominantly 
found attached to other biomolecules such 
as proteins and lipids. These glycan-decorated 
biomolecules, termed glycoconjugates, are pro- 
duced through complex enzymatic glycosyla- 
tion events—the most common and diverse type 
of posttranslational modification (PTM), which 
greatly expands the functions of biomolecules 
(13, 14). The abundance of glycoconjugates in 
biological systems and their roles in health 
and disease make them attractive targets in 
basic and translational research for new thera- 
peutic and diagnostic strategies (1-3, 15, 16). 
However, despite the ubiquity and importance 
of glycoconjugates, research to unveil their 
structure-property relationships has been 
challenging (17-19). 

Glycoconjugates possess extensive structural 
heterogeneity (i.e., multiple variants of se- 
quence) and structural isomerism (i.e., struc- 
tures with equal masses), which pose a challenge 
for today’s analytical methods (17-19). Glycocon- 
jugates are presently analyzed by a combi- 
nation of chemical labeling, chemoenzymatic 
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digestion, and ensemble-averaged methods to 
indirectly obtain the most likely structures 
present in a sample (8, 19). Ensemble-averaged 
analysis of structurally heterogeneous and iso- 
meric glycoconjugates, however, obscures the 
position and structures of glycans bonded to 
a biomolecule, particularly for proteins with 
multiple glycans attached. As a result, insights 
into the structures of individual molecules are 
lost with ensemble-averaged analysis, which 
hinders structure-property relationship stud- 
ies of glycoconjugates. Preventing the loss of 
structural information for individual molecules 
requires glycoconjugate molecules to be ana- 
lyzed at the single-molecule level. 

We realize single-molecule analysis of glyco- 
conjugates by performing direct, label-free, 
spatial imaging on individual glycoconjugate 
molecules. We show that imaging single glyco- 
conjugates at subnanometer resolution reveals 
the primary structure of each molecule, by un- 
veiling how its constituent amino acid, lipid, 
and monosaccharide subunits connect to one 
another. As a result, our imaging method 
establishes glycan sequences at every glycan 
attachment site in a glycoconjugate by locating 
every monosaccharide in the glycoconjugate 
molecule discriminated by its stereoconfigu- 
ration and side group. We demonstrate our 
method for a wide range of glycoconjugates, 
starting from simple glycopeptides and glyco- 
lipids up to complex glycoproteins with more 
than 20 attached glycans. Our work shows 
that single-molecule imaging provides direct 
access to all glycan structures bonded to the 
complex glycopeptides, glycolipids, and glyco- 
proteins at the single-molecule level. 


Direct imaging of glycoconjugates soft-landed 
at a surface 


We accomplished direct imaging of single- 
glycoconjugate molecules by combining 
soft-landing electrospray ion beam deposition 
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(. 


(ESIBD) (20, 27) and scanning tunneling oer 


croscopy (STM) (see Methods for deta. 
We show that STM imaging of single glyco- 
conjugates on a Cu surface at cryogenic tem- 
peratures corroborated by density functional 
theory (DFT) calculations provides direct access 
to their structural information (Fig. 1). We 
imaged glycoconjugate ions obtained from 
nanoelectrospray ionization (nESI) (22) (fig. S1), 
whose usage was critical to lower the amount 
of sample required to ~1 nanomole, given that 
glycoconjugates—unlike proteins (23, 24) and 
glycans (25)—are more limited in sample quan- 
tity. In cases of sulfated molecules (fig. $2), 
we deposited the molecules on a more inert 
Ag surface to preserve the labile sulfate groups 
(fig. S3). 

We first highlight the capabilities of STM 
imaging and DFT modeling in characteriz- 
ing monosaccharide structures of glycans. We 
imaged two glycopeptides (26) (Fig.1, Band C) * 
composed of a disaccharide (cellobiose, Glef1- 
4Glc, or lactose, Galf1-4Glc) linked to a tri- 
peptide, and three glycosaminoglycans (GAGs) 
(Fig. 1, D to F). STM imaging of the glyco- 
peptides was found to differentiate the glycan 
and peptide moieties by their heights: tall/bright 
for the glycan and low/dim for the peptide 
(Fig. 1, B and C), which allowed the DFT cal- 
culations to yield the primary structure of the 
glycopeptides, resolving the order by which . 
amino acid and monosaccharide subunits in 
the molecule are connected to one another. In 
addition, STM imaging was found to discrimi- 
nate glucose from its epimer, galactose (i.e., 
they differ only in the stereoconfiguration of 
their C4-atoms), given that the glucose (h = 
2.1 + 0.2 A, N = 45) was observed to be consis- 
tently taller than the galactose (h = 1.7 + 0.3 A, 
N = 75). STM imaging was also used to locate 
and identify side groups and sulfate moieties 
present in every monosaccharide, as exempli- , 
fied by the imaging of GAGs (Fig. 1, D to F). 
Each GlcNAc monosaccharide was observed 
with a dim protrusion corresponding to the 
N-Acetyl (NAc) moiety, clearly distinct from 
the GlcA monosaccharides (Fig. 1D). Notably, 
each GlcNAc6S and GalNAc6S monosaccharide 
was observed with its sulfate moiety (a dim 
protrusion encircled with a dark ring) on the 
opposite side from its NAc moiety (a dim pro- 
trusion without dark ring) (Fig. 1, E to F); 
whereby GIcNS6S was observed with two sulfate 
moieties, each appearing as a dim protrusion 
with a dark ring (Fig. IE). We further ascer- 
tained the STM appearance of sulfate moiety 
on the same surface as a dim protrusion en- 
circled with a dark ring by imaging simple aryl 
sulfates (fig. S3). Our findings show that STM 
imaging and DFT modeling have sufficient 
sensitivity and resolution to locate glycans in 
molecules and discriminate the constituent 
monosaccharides based on their stereoconfig- 
urations and side groups. 
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Fig. 1. STM imaging of simple glycoconjugates and glycosaminoglycans (GAGs). (A) Schematic of 

the experiment: glycoconjugate or GAG ions generated by nESI were mass-selected, soft-landed intact on the 
surface held at 120 K, and imaged by STM at 11 K (see Methods). STM images of glycopeptides, Glc-Glc- 
AsnPheAla (B) and Gal-Glc-AsnPheAla (C), reveal the glycan and the peptide domains of the molecule 

and differentiate each monosaccharide in the glycan domain, i.e., glucose (Glc) versus galactose (Gal). 
Imaging GAGs (D to F) reveals the positions of N-Acetyl groups on all N-acetylglucosamine (GIcNAc) 
monosaccharides, differentiating them from the glucuronic acid (GlcA) monosaccharides, as well as the 
sulfated GIcNAc6S, GalNAc6S, and GIcNS6S monosaccharides. The GAGs in (D) and (F) are terminated 

by para-nitropheny! (pnp) whereas in (E) the GAG is terminated by para-(6-azidohexanamido)pheny! (pap). 
STM images were interpreted by STM simulation of molecular structures computed by DFT. Scale bar is 


0.5 nm. Glycan icons follow the SNFG standard (57). 


We show the perspectives of our direct single- 
molecule analysis by determining structures of 
entire glycans present in complex glycopep- 
tides, glycolipids, and glycoproteins at the single- 
molecule level. For glycopeptides, we examined 
an egg yolk sialoglycopeptide derivative (Fig. 
2A), which is widely used in biochemical appli- 
cations (27). For glycolipids, the GM3 and GD3 
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gangliosides (Fig. 2, B and C) were chosen 
because of their roles as cancer antigens (5, 15). 
For N-glycoproteins, we chose the widely studied 
pancreatic RNase B (28). Finally, as a repre- 
sentative for O-glycoproteins, we chose a frag- 
ment of human mucin MUC1—one of the most 
complex glycoproteins in biological systems— 
which is also overexpressed with aberrant O- 
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glycans in cancer, and a promising cancer bio- 
marker and immunotherapeutic target (29, 30). 
In all cases, there is a clear height contrast 
between the bright glycan domain and the 
dim peptide or lipid domains that establishes 
their respective primary structures. For the 
N-glycopeptide (Fig. 2A), the height contrast 
discriminated the GlcNAc and Fuc mono- 
saccharides (h = 2.2 + 0.4.A, N = 132) from the 
mannose (A = 1.9 + 0.4 A, N = 66) whereas in 
the glycolipids (Fig. 2, B and C), the height 
contrast differentiated glucose (one lobe, 2 = 
2.1+ 0.3 A, N = 158), galactose (one lobe, h = 
1.9 + 0.2 A, N = 158), and sialic acid (Neu5Ac) 
(two or more lobes) from one another. The 
imaging was found to distinguish glycosidic 
bonds by the characteristic angle formed be- 
tween monosaccharides when their pyranose 
rings adsorb horizontally on a surface (experi- 
mentally verifiable by their respective heights). 
For example, the Siax2-3Gal1-4Glc in GM3 
and GD3 (Sia = Neu5Ac monosaccharide) was 
observed to form an obtuse 141 + 22° angle 
(N = 158) whereas the GlcfB1-4GlcB1-4Glc in 
cellohexaose was observed to form a straight 
180 + 25° angle (N = 204) (37). In addition, 
the imaging of single glycolipids revealed their 
molecular conformations (fig. S4:) and allowed 
discrimination of the ceramide moiety with 
varied lipid chain lengths (fig. S5), both of 
which may provide additional information 
regarding structural studies of lipids. Notably, 
we observed the “open” conformation of the 
lipid moiety in glycolipids (Fig. 2C and fig. S4), 
which has been discussed in relation to mech- 
anisms of membrane fusions and protein- 
membrane interactions (32, 33). 


Imaging single N- and O-glycoproteins 


Direct imaging of a single N-glycoprotein, RNase 
B, revealed the structure and the location of 
the N-glycan bonded to the protein backbone 
(28) (Fig. 3). We examined RNase B by imag- 
ing individual proteins in their fully unfolded 
state, which we prepared by exclusively de- 
positing the highly charged protein ions on a 
surface (24) (fig. S6). Given that RNase B has 
five glycoproteoforms (28) (each featuring 
one of five distinct N-glycan structures from 
Man;GlcNAc, to MangGlcNAc,), our single- 
protein imaging allowed the glycoproteoforms 
of RNase B to be determined one molecule at 
a time, as shown in Fig. 3B for MangGlcNAc, 
and in Fig. 3C for MansGlcNAcy,, which are 
the two most abundant glycoproteoforms 
of RNase B (fig. S1G). The STM imaging 
clearly revealed the glycan attachment site 
by locating the intersection between the 
N-glycan and the protein backbone (red dots 
in Fig. 3, B and C). Analysis of 33 individual 
N-glycoproteins confirmed residue 34 (+ 2) 
as the glycan attachment site, consistent with 
Asn34 as the known glycan attachment site for 
RNase B (28). 
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Fig. 2. STM imaging of 
single glycopeptides 
and glycolipids. STM 
images of an N-glycopeptide 
(biantennary N-glycan 
NGA2F on the KVANKT 
peptide) (A), and glycolipids, 
GM3 ganglioside (B) and 
GD3 ganglioside (C), reveal 
the glycan, peptide, and 
ceramide (Cer) domains in 
the respective molecules 
(Cer consisted of a fatty 
acid chain and sphingosine 
of varying length). STM 
imaging differentiates 
individual monosaccharides 
in the glycoconjugate; 

i.e., glucose (Glc), 
galactose (Gal), sialic 

acid (Sia = NeudAc), 
N-acetylglucosamine 
(GIcNAc), mannose 
(Man), and fucose (Fuc). 
STM images were inter- 
preted by structures 
computed by DFT. 

Scale bar is 0.5 nm. 
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Fig. 3. STM imaging of single N-glycoproteins. (A) Sequence of RNase B (124aa) with one N-linked glycan 
at Asn34. Imaging single unfolded RNase B molecules reveals the N-glycan position along the protein 

backbone and the N-glycan structure found on individual glycoproteoforms one molecule at a time, as shown 
in (B) for MangGIcNAco and in (C) for MansGIlcNAc». The position of Asn34 is estimated by a red dot along 
the protein backbone (white dashed line). Scale bar is 1 nm. 
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To illustrate the full perspective of single- 
glycoprotein imaging, we examined an O- 
glycoprotein fragment derived from the large 
mucin MUCI. Mucins are considered one of 
the last frontiers in glycoanalytics that has re- 
mained largely unexplored (34-36) despite their 
considerable importance in mucosal biology 
and host-pathogen interactions (37). Structural 
analysis of mucins and their multiple glyco- 
sylation sites is challenging because of their 
enormous size and dense decoration of O- 
glycans, resulting in heterogeneity and resistance 
to protease digestion (34-36). Here we show 
that it is possible to analyze such heavily O- 
glycosylated proteins one molecule at a time 
(Fig. 4) by using the soft deposition and imag- 
ing of single glycoproteins on a surface (fig. S7). 

We imaged a representative fragment of the 
densely O-glycosylated tandem repeat region of 
human MUCI1 mucin as an O-glycoprotein report- 
er with relatively homogeneous trisaccharide 
O-glycans [“core 3” (GalB14GIcNAcB1-3GalNAcol— 
O-S/T)] (Fig. 4). For this, we employed our 
recently developed cell-based strategy using 
genetically glycoengineered HEK293 cells for 
recombinant production of mucin reporter glyco- 
proteins with custom-designed O-glycosylation 
(35) (Fig. 4A) (see Methods). The MUCI O- 
glycoprotein reporter, featuring 34 potential 
O-glycosylation sites, was analyzed by intact 
and bottom-up mass spectrometry, as well as 
profiling of released O-glycans, which revealed 
a relatively homogeneous mixture of O-glycans 
(mainly core 3) and number of O-glycans 
(mainly 19 to 23 glycans) (35, 38) (see also 
figs. SIH and S9 for MUCI1 sample used in 
single-molecule imaging experiments). 

Imaging single MUC1 O-glycoproteins allows 
direct observation of the variation in number, 
structure, and attachment sites of O-glycans 
on the protein backbone, as shown in examples 
with 27, 21, and 20 O-glycans (Fig. 4, B to D). 
On the individual MUC1 molecules, we found 
the O-glycans mainly to be the core 3 tri- 
saccharides with the occasional sialylated core 
3 tetrasaccharide (Siax2-3Gal1-4GlcNAcf1- 
3GalNAc) in agreement with the glycoprofiling 
analysis (35, 38) (fig. S9). Most importantly, 
the direct imaging of MUCI clearly revealed the 
positions of each O-glycan at S and T sites along 
the protein (red dots in Fig. 4, B to D) (see fig. 
S10 for an example). Analysis of the O-glycan 
positions on 18 MUC1 O-glycoproteins observed 
(table S1) revealed three prevalent patterns 
of O-glycan distribution on the MUC1 tandem 
repeats as -TS-T-ST- ; -TS-T-ST- ; and -TS-T- 
ST- (bold underlined indicates glycosylated, 
see table S2). These patterns are largely in 
agreement with the predicted O-glycosylation 
sequence of the MUCI1 from both in vitro (39) 
and in vivo (38) enzyme specificity analysis. 
The O-glycosylation process is a complex event 
with multiple isoenzymes (polypeptide GalNAc- 
transferases) each attaching O-glycans at select 
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Fig. 4. STM imaging of single O-glycoproteins. (A) Sequence of MUC1 reporter (148aa) containing 6.5 
tandem repeats of 20 amino acids (GVTSAPDTRPAPGSTAPPAR) decorated with O-glycans at the Ser (S) 
and Thr (T) residues (total of 34 potential O-glycosites). Imaging single MUC1 proteins reveals the number, 


the structure, and the attachment site of O-glycans 


decorating the protein, as shown for a glycoproteoform 


with 27 O-glycans in (B), 21 O-glycans in (C), and 20 O-glycans in (D). The positions of S and T residues 
are indicated by red dots along the protein backbone (white dashed line). The unannotated STM images 


are given in fig. S8. Scale bar is 1 nm. 


positions in proteins, and the O-glycosylation 
of the five possible sites in the MUCI tandem 
repeat requires sequential orchestrated action 
of multiple isoenzymes (39). Further analysis 
of the STM results (table S1) revealed on aver- 
age 3.4 O-glycans per tandem repeat with 
preferred positions at T in VISA (87% occu- 
pied) and ST in GSTA (78 and 83% occupied, 
respectively) and less preferred positions at 
S in VISA (44% occupied) and T in PDTR (55% 
occupied). These results corroborate our pre- 
vious studies of the MUCI1 reporter protein 
with core 3 O-glycans (35, 38), for which we 
found reduced occupancy of O-glycans at S in 
VTSA and T in PDTR (38, 40, 41). Direct STM 
imaging thereby yields detailed snapshots of 
single-molecule glycoproteoforms that can 
unveil potential interplay between glycosyla- 
tion at different positions in proteins and the 
glycan structures that may be assembled at 
these positions (table S2). In addition, STM 
imaging allows direct observation of glycan- 
glycan interactions dictating the overall shape 
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of the protein backbone. We expect the single- 
molecule analysis approach to be widely appli- 
cable to glycoproteins that can be electrosprayed 
in unfolded states, regardless of size and num- 
bers of attached glycans. In case of increas- 
ingly dense glycans causing proteins to unfold 
incompletely, the STM tip could be used to 
further unfold the protein to clarify its pri- 
mary structure (fig. S11). 


Conclusions 


Our combination of electrospray deposition 
and scanning tunnelling microscopy analysis 
provides an opportunity to look directly at the 
primary structures of complex glycoconju- 
gates, including glycoproteins with multiple 
glycans attached. This technology, corroborated 
by DFT modeling, should enable direct obser- 
vation of diverse PTMs on biomolecules (42, 43) 
as well as structures of glycoconjugates that 
are well beyond today’s analytical capabilities, 
such as proteoglycans (44), glycoRNAs (45), 
lipopolysaccharides (46), and carbohydrate 
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vaccines (16). Although the present work demon- 
strates that prior knowledge of the amino acid 
sequence of the glycoproteins is advantageous 
to enable interpretation of the STM images, 
we recognize that STM imaging can still be 
further improved to identify each amino acid 
and monosaccharide in a molecule and thus 
identify single proteins/glycoproteins in complex 
biological mixtures. These improvements in- 
clude the use of a functionalized tip to resolve 
and distinguish covalent bonds in a molecule 
(47), as well as the use of tunneling spectros- 
copy (48), nuclear spin detection (49), or optical 
fingerprinting (50-52) to identify electronic 
signatures of specific atoms or functional 
groups in molecules. With these improve- 
ments, we expect that STM can contribute 
to identification of unknown glycoproteins 
or glycolipids, which may ultimately lead 
to the discovery of individual glycoproteins 
and glycolipids in complex cellular mixtures, 
particularly in the context of glycoproteomics 
and glycolipidomics studies. Complementing 
these improvements with automated struc- 
ture solvers (53-55) or tip preparation will 
increase the throughput of scanning probe 
microscopy and create opportunities to solve 
previously intractable problems in single- 
molecule (bio)analytical chemistry. 
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Neurons relay information via specialized presynaptic compartments for neurotransmission. Unlike 
conventional organelles, the specialized apparatus characterizing the neuronal presynapse must 
form de novo. How the components for presynaptic neurotransmission are transported and 
assembled is poorly understood. Our results show that the rare late endosomal signaling lipid 
phosphatidylinositol 3,5-bisphosphate [PI(3,5)P2] directs the axonal cotransport of synaptic vesicle 
and active zone proteins in precursor vesicles in human neurons. Precursor vesicles are distinct 
from conventional secretory organelles, endosomes, and degradative lysosomes and are transported by 
coincident detection of PI(3,5)P2 and active ARL8 via kinesin KIF1A to the presynaptic compartment. 

Our findings identify a crucial mechanism that mediates the delivery of synaptic vesicle and active zone 


proteins to developing synapses. 


embrane-bound compartments are a 

hallmark of eukaryotic cells (7, 2). In 

contrast to cell division, which allows 

for organelles to be inherited from the 

mother cell (7, 2), the specialized se- 
cretory apparatus for neurotransmission must 
form de novo during the differentiation of stem 
cells into neurons. How synaptic vesicles (SVs) 
(3) and the presynaptic active zone (4, 5) scaf- 
folds are formed, transported, and assembled 
into a functional presynaptic compartment for 
neurotransmission in developing mammalian 
central nervous system (CNS) neurons is un- 
known (6, 7). Most SV proteins are transmem- 
brane proteins that are somatically synthesized 
in the endoplasmic reticulum (ER), from where 
they are exported to the trans-Golgi network 
(8), a compartment that is largely absent from 
axons and presynaptic nerve terminals (9). As 
a consequence, presynaptic biogenesis requires 
the formation of SV protein-containing precur- 
sor vesicles (PVs) that are axonally transported 
to the nascent presynapse (JO) via the kinesin 
KIFIA (called UNC104 in Caenorhabditis elegans 
and UNC104/IMAC in Drosophila melanogaster) 
(7, 11-16). Early studies of mouse peripheral 
axons in situ and cultured rat hippocampal 
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neurons revealed the presence of heteroge- 
neously sized tubulovesicular structures that 
may represent presynaptic precursor organelles 
(17-19). Later works suggested the existence of 
specialized axonal transport vesicles for active 
zone proteins (20, 27). Whether the alleged car- 
riers for SV and active zone proteins are com- 
mon (22) or distinct organelles, what their cell 
biological identity is and how this identity 
relates to the machinery for axonal transport 
remains to be fully elucidated. 


Axonally transported PVs represent a distinct 
type of organelle 


We adopted a previously described protocol 
for the differentiation of human induced pluri- 
potent stem cells (iPSCs) into glutamatergic 
neurons (iNs) (fig. SLA) (23). Coculture of these 
developing iNs with mouse astrocytes enabled 
the formation of functional pre- and postsyn- 
aptic compartments (fig. S1, B to E). Optical 
imaging of mature iNs (days in vitro 28 to 30) 
expressing a chimera between the SV protein 
Synaptophysin and pH-sensitive pHluorin- 
green fluorescent protein (GFP) and stimu- 
lated with varying numbers of action potentials 
revealed potent exo-endocytic responses with 
kinetic parameters (fig. S1, F and G) similar 
to the kinetics of exo-endocytosis measured in 
mouse hippocampal neurons (24). Human iNs 
were capable of synaptic transmission and dis- 
played short-term depression and asynchronous 
release in response to high-frequency stimu- 
lation (fig. SIH). To follow the development of 
functional presynaptic compartments, we mon- 
itored the axonal transport of presynaptic 
components in iNs using multicolor spinning 
disk confocal imaging paired with automated 
dual-channel tracking and kymographic analy- 
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protein (mRFP) was present in diffraction- 
limited motile puncta, most of which under- 
went anterograde axonal transport [see Fig. 
1, B and C, for endogenous Synaptophysin- 
enhanced GFP (eGFP)]. Anterogradely mov- 
ing Synaptophysin puncta contained other 
SV proteins such as Synaptobrevin 2 (also 
referred to as VAMP2) (fig. S1, J and L) or the 
vesicular glutamate transporter 1 (VGLUT1) 
(fig. SIL), which suggests that SV proteins 
might be axonally transported as a single entity. 
We found that further key components of the 
presynapse, such as the active zone proteins 
Bassoon and MUNC13-1 or the presynaptic ad- 
hesion molecule Neurexin1B, also underwent an- 
terograde axonal cotransport with SV proteins 
(figs. S1, K and L). Synaptophysin-mRFP was 
similarly cotransported with vGLUTI, the active 
zone proteins Bassoon and MUNCI13-1, and 
Neurexin1f, but not with mitochondria in 
mouse hippocampal neurons (Fig. IF). To cor- 
roborate these findings, we generated genome- 
engineered iNs expressing Synaptophysin-eGFP 
from their endogenous locus (fig. SIM). Endogenous 
Synaptophysin-eGFP was largely contained in 
diffraction-limited anterogradely moving puncta 
(Fig. 1, B to E) together with other SV and ac- 
tive zone proteins (Fig. 1, B to D, and movies S1 
and 82). Inhibition of protein synthesis de- 
pleted Synaptophysin vesicles that were anter- 
ogradely moving (Fig. 1E). 

Next, we tested whether axonally trans- 
ported PVs contain elements of the secretory 
pathway. Markers of the ER such as SEC618 
or KDEL, ARFI1 (a marker for Golgi-derived 
transport carriers), the cis-Golgi protein Giantin, 
trans-Golgi network protein 2, or RAB6 (a 
marker for post-Golgi vesicles) were absent 
from axonally transported PVs (fig. S2, A and 
B). Anterogradely trafficked PVs were also seg- 
regated from axonally transported mitochon- 
dria (Fig. 1F and fig. S2A). Axonally transported 
PVs displayed moderate cotransport with RAB5, 
a protein present on early endosomes and on 
SVs (3, 25), whereas the late endosomal mark- 
er RAB7 was largely absent from PVs. Axonally 
trafficked PVs were poorly accessible to recy- 
cling endosomal markers such as internalized 
transferrin (fig. S2, A and B). By contrast, an- 
terogradely moving PVs in ~50 to 60% of cases 
contained late endosomal and lysosomal mem- 
brane proteins such as LAMP1, CD63, and the 
lysosome-associated small guanosine triphos- 
phatase (GTPase) ARL8B (fig. S2, A and B). 
Cotransport of PVs with lysosomal membrane 
markers such as ARL8B (fig. S2B and movie 
S3) and LAMP! (fig. S2B) was confirmed in iNs 
endogenously expressing Synaptophysin-eGFP 
(Fig. 1, G and H) and, conversely, in LAMP1- 
eGFP knock-in iNs (Fig. 11 and fig. SIM). Be- 
cause axonal PVs are involved in the delivery 
of newly synthesized presynaptic proteins, we 
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Fig. 1. Nascent SV and active A 
zone proteins are cotransported 
in iN axons. (A) Scheme of the 
presynaptic compartment. AZ, 
active zone; Cav, voltage-sensitive 
calcium channel; nCAM, neural cell 
adhesion molecule; PSD, post- 
synaptic density; SV, synaptic vesicle. 
(B and C) Kymographs showing the 
trafficking of endogenous SYP°"?°- 
eGFP with VAMP2-SNAP (B) or 
MUNC13-1—SNAP (C). The x axis is 
30 um; the y axis is 1 min. Vertical 
lines indicate static foci. Scale bar, 
10 um. (D) Fraction of anterogradely F 
moving SYP°"°-eGFP vesicles 
cotrafficking with SYP-SNAP (87.3 + 
3.2%), VAMP2-SNAP (87.1 + 4.3%), 
and MUNC13-1-SNAP (88.4 + 7.0%). 
n = 4 experiments (250 vesicles 
each). (E) Number of anterogradely 
trafficking SYP°"“°-eGFP vesicles (per 
min per 30-um length) in proximal 
axons of iNs on days 12 to 13 after 
24-hour incubation with DMSO, 


2) 
pe¥) 
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—_: 
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% Anterograde cotransport with 
SYP-mRFP (hippocampal neurons) 
oOo 


cycloheximide (CHX), or anisomycin: ONS 
DMSO, 11.5 + 1.3; CHX, 2.2 + 0.2: A 
anisomycin, 2.1 + 0.1.n = 3 J 


experiments (250 vesicles each). 
One-way analysis of variance 
(ANOVA) followed by Dunnett's 

post hoc test was performed; **p < 
0.01. (F) Fraction of anterogradely 
moving vesicles colabeled for SYP- 
mRFP and SYP-eGFP (92.8 + 2.6%), 
vGlutl-venus (89.6 + 3.3%), eGFP- 
Bassoon (79.2 + 7.5%), MUNC13-1- 
eGFP (82.7 + 5.9%), Neurexinl1B-eGFP 
(83.4 + 3.4%), and Mito-Halo (6.5 + 
1.8%) in proximal axons of mouse 
hippocampal neurons. n = 3 
experiments (250 vesicles each). 
(G) Kymographs illustrating 
cotrafficking of endogenous SYP°"%°- 
eGFP with ARL8B-iRFP in iN. The 

X axis is 30 um; the y axis is 1 min. 
Vertical lines indicate static foci. Scale 
bars, 10 um. (H) Fraction of anterogradely moving SY 


Cumulative Plot of 
Vesicle Sizes 


for LAMP1-mRFP (61.0 + 10.4%) or ARL8B-iRFP (56.3 + 6.3%). n = 5 experiments 
(250 vesicles each). (I) Fraction of anterogradely moving LAMP1°"°°-eGFP vesicles 
colabeled for SYP-mRFP (44.0 + 6.4%) or Sytl-mRFP (62.9 + 8.9%). n = 4 experiments 
(250 vesicles each). (J) Three-dimensional reconstruction of PVs (green) recruited 


expected them to be distinct from degradative 
lysosomes. Consistently, we found that ante- 
rogradely transported PVs lack lysosomal ca- 
thepsin activity (fig. S2C) and are nonacidic (fig. 
S2F). Lysotracker-positive degradative lyso- 
somes that contain endogenous LAMP1-eGFP 
were mostly found in neuronal somata, al- 
though motile acidic lysosomes were detect- 
able in axons (fig. S2, D and E) [consistent with 
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(26)]. We conclude that axonally transported 
PVs are distinct from conventional secretory 
organelles, recycling endosomes, and mature 
lysosomes. Instead, they may represent a neu- 
ron-specific biogenesis organelle, which derives 
from a pathway that sorts lysosomal membrane 
proteins [see (26-28)]. 

To characterize PVs ultrastructurally, we de- 
vised a chemical genetic strategy to enrich these 
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to mitochondrion (red). Scale bar, 500 nm. (K) Ultrastructure of PVs recruited to 
mitochondria analyzed by FIB-SEM. Asterisks mark vesicles. Scale bars, 100 nm. 
(L) Cumulative plot of diameters of PVs and SVs. (M) Violin plot of the distribution of 
PV sizes. Whiskers show the minimum and maximum, box borders indicate first 
and third quartiles, and the line indicates the median. Data are mean + SEM. 


rare and transient organelles at a defined axonal 
location, thereby enabling their ultrastructural 
characterization by correlative light and elec- 
tron microscopy (CLEM). We coexpressed the 
FRB domain of mTOR kinase targeted to the 
outer membrane of mitochondria (Mito-FRB) 
with a chimera between Synaptophysin and 
FK506-binding protein 12 (SYP-FKBP). In the 
absence of rapamycin, SYP-FKBP localized to 
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axonal PVs that displayed no colocalization with 
Mito-FRB-positive mitochondria (fig. S2, I and 
J). Rapamycin induced FRB-FKBP heterodimer- 
ization and caused PVs marked by SYP-FKBP 
to be recruited to the mitochondrial surface. 
Sequestration of SYP-FKBP-containing PVs 
at mitochondria occurred in tandem with the 
mitochondrial accumulation of endogenous 
Piccolo, a major active zone protein, and the 
SV calcium sensor Synaptotagmin 1 (fig. S2, I 
and J). We then determined the ultrastructure 
of mitochondrially sequestered presynaptic PVs 
in proximal axons by correlative spinning disk 


Fig. 2. Axonal transport of PVs is controlled by 
ARL8A and/or ARL8B and KIF1A. (A) Kymographs 
of trajectories of SYP-eGFP vesicles in WT or ARLSA/B 
double-KO iNs. The x axis is 30 wm; the y axis is 

1 min. Scale bar, 10 um. DKO, double KO. (B) Number of 
anterogradely moving SYP-eGFP vesicles (per min 

per 30-um length): WT, 9.2 + 0.6; WT + ARL8B-Myc, 
8./ + 1.4; ARLSA/B double KO, 2.6 + 0.9; ARLSA/B 
double KO + ARL8B-Myc, 7.9 + 1.7. n = 3 experiments 
(250 vesicles each). One-way ANOVA followed by 
Dunnett's post hoc test was performed (C) Scatterplot 
of ARL8B-mCherry interacting proteins in iNs. Blue, 
lysosomal proteins; green, kinesins; orange, SV proteins. 
LFQ, label-free quantification. (D) Coimmunoprecipitation 
of ARL8B-mCherry with KIFIA-eGFP in developing iNs. 
Bound proteins were detected by immunoblotting. 
Total lysate blots represent 5% of material used as 
input for coimmunoprecipitation. IP, immunoprecipitate. 
(E) Kymographs illustrating cotrafficking of KIFIA-eGFP 
with SYP-mRFP (top) or ARL8B-mCherry (bottom) in 
iNs. The x axis is 30 um; the y axis is 1 min. Scale bars, 
10 um. (F) Fraction of anterogradely moving SYP-mRFP 
(76.9 + 6.5%), MUNC13-1-SNAP (79.7 + 5.9%), or 
ARL8B-mCherry (64.1 + 10.1%) puncta cotrafficking with 
KIFIA-eGFP. n = 5 experiments (250 vesicles each). 
(G) Kymographs of trajectories of SYP-eGFP vesicles in 
iNs that are WT, KIFIA KO, or KO expressing full-length 
KIFIA (KIFIA-FL) or Rigor mutant KIFIA. The x axis 
is 30 um; the y axis is 1 min. Scale bar, 10 um. 

(H) Number of anterogradely moving, retrogradely 
moving, or stationary SYP-eGFP vesicles (per min 

per 30-um length): WT, 9.7 + 1.7; KIFIA KO, 2.4 + 0.5; 
KO + KIFIA-FL, 8.7 + 0.3; KO + KIFIA-rigor, 2.4 + 0.3. 
n = 3 experiments (250 vesicles each). One-way ANOVA 
followed by Dunnett's post hoc test was performed. 

(I) Kymographs showing trajectories of SYP-mRFP 
vesicles in primary mouse hippocampal neurons 
expressing full-length KIFIA or Rigor mutant KIFIA. 
Scale bar, 10 um. (J) Number of anterogradely moving 
SYP-mRFP vesicles (per min per 30-um length) in 
mouse hippocampal neurons: KIFIA-FL, 8.1 + 1.1; KIFIA- 
Rigor, 2.5 + 0.8. n = 3 experiments (250 synaptic 
puncta each). t test was performed. For all graphs, 
***9 < 0.001, **p < 0.01, and ns is P > 0.05. Data are 
mean + SEM. 
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confocal light and focused ion beam milling 
scanning electron microscopy (FIB-SEM) (fig. 
S2G). FIB-SEM and three-dimensional (3D) 
reconstruction (Fig. 1J) revealed the pres- 
ence of a population of vesicles and tubules 
in direct contact with the mitochondrial sur- 
face (Fig. 1K, I to VI). Quantitative morphomet- 
ric analysis showed that >90% of these had 
diameters of 50 to 400 nm (Fig. 1L) with a mean 
size of 166 + 14 nm and a median diameter of 
87 nm (Fig. 1M), a size distribution distinct from 
that of mature SVs (Fig. 1L). Minor fractions 
of vesicles contained electron-dense material 


> 


(5%) (Fig. 1K, III) or comprised large multi- 
vesicular bodies (6%) (Fig. 1K, VI). These large 
multivesicular bodies may originate from the 
degradative pathway and are unlikely to relate 
to presynaptic biogenesis. No vesicle or tubule 
accumulations were found in the absence of 
rapamycin (fig. S2H). These data (Fig. 1, J to 
M) show that presynaptic biogenesis in mam- 
malian CNS neurons involves vesicular and 
tubular compartments of 67 nm (Q1, lower 
quartile) to 220 nm (Q3, upper quartile), con- 
sistent with the reported ultrastructure of al- 
leged presynaptic transport packets in mouse 
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Fig. 3. Loss of ARL8A and A WT ARL8A/B DKO DKO + ARL8B B Synapses BSN Homer1 
ARL8B or KIF1A/Unc104 ig ns ns ns 
a weer ns 


impairs presynaptic 
biogenesis and function. 
(A) Representative confocal 
images of WT iNs, ARL8A/B 
double-KO iNs, and double- 
KO iNs expressing ARL8B-c- 
myc at day 30 that were 
immunostained for pre- 
synaptic Bassoon (BSN), 
postsynaptic Homer], 


# Puncta per 20 um 
dendrite length 


and MAP2. Scale bar, C 7 — 190 — 
5 um. (B) Number of 3 3 
synapses and number of r | - | 3s 100 oS 
pre- and postsynaptic puncta = = Pay Py 
per 20-um dendrite length. f c 5 c 
n = 3 experiments (250 puncta O O is c 
each). (C and D) Repre- = ra 
sentative confocal images of S SS 
WT iNs, ARL8A/B double- oe ae 
KO iNs, and double-KO iNs = a 
expressing ARL8B-c-myc at a = a. os 
day 30 immunostained for mA a =] = 
Synaptophysin (C), Bassoon = < = 150 2 
(D), and Homerl [(C) and Q OF — 100) = 
(D)]. Scale bars, 10 um. (E to fa) Qa i 
H) Quantified mean fluores- @ 50 iS 
cence intensity of pre- and o = 
postsynaptic proteins in WT 
iNs, ARL8A/B double-KO iNs, 
and double-KO iNs expressing 
ARL8B-c-myc (day 30). n = 3 | Synapses BSN Homer1 J Syp K 
experiments (2100 synapses E12 ns ns ns __ 150 Ags _15 aE = 
each). Shown are Synapto- os = Po Ds = ae cae = ——— = 
physin (SYP) (E), Bassoon Pe 6 918 aa Ag n dy se 100} See 4. se 100} gir A 3S 
(F), Cav21 (G), and Homerl Bop * * +s Fe - > 
(H). (I) Number of synapses he % 3 ‘@ 50 _ @ 50 > 
and of pre- and postsynaptic a3? & 2 e 
puncta per 20-um dendrite th ~ 0 7 
length in WT iNs, KIFIA KO 
iNs, and KO iNs expressing 
KIFIA (day 30). n = 3 experi- 
ments (250 puncta each). ns 
(J to M) Quantified mean M GLUNT N = __ 200 mE 0 presynapse ao P 40000 aS 
fluorescence intensity of = 200 ns as = a Ss000] — 
pre- and postsynaptic pro- © 57 <6 eee es a = 6, e bd a5 
teins in WT iNs, KIFIA KO = 2 E1004 gle - >) syp-GCaMP6f S 20000 $ 
iNs, and KO iNs expressing 2 100, ogo Maw “ae —, s°. a a ; ay 3 z = v 
KIFIA (day 30). See (E) to i a A 2 sof § ig 5m ~~ + 
(H) for abbreviations. n = 3 = SE : Ss s 
experiments (>100 synaptic ™ + .* sie 

MW ' od S Nx ly i by 
puncta each). (N) Defective gt eo ; i: — 
presynaptic biogenesis in e yO we Xph20-GCaMP7f se 
absence of Uncl04-mediated postsynapse 


PV delivery in D. melanogas- 

ter. Reduced amounts of 

VGLUT at Uncl104 mutant 

NMJs. WT, 100 + 11; Uncl04, 

6.3 + 0.8. n = 10 (WT) and 10 (Unc104) NMJs. t test was performed. (0) Schematic representation of pre- and postsynaptic calcium sensors SYP-CGaMPof 

and Xph20-CGaMP/f. (P) Mean normalized fluorescence (AF/Fo) response of Xph20-CGaMP/7f to 100—action potential (10 Hz, 10 s) stimulation in WT, 
ARL8A/B double-KO, and KIFIA KO iNs. a.u, arbitrary units; AUC, area under the curve. Data were normalized to prestimulation. n = 3 independent experiments. 
In (B), (E) to (M), and (P), one-way ANOVA followed by Dunnett's post hoc test was performed. For all graphs, ****p < 0.0001, ***p < 0.001, **p < 0.01, 
and ns is p > 0.05. Data are mean + SEM. 
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saphenous nerves in situ (/7) and in rat hip- 
pocampal axons (19). 


Anterograde axonal transport and 
presynaptic delivery of PVs depend on 
ARL8A and/or ARL8B and KIF1A 


Prior studies in invertebrate models (29-31) 
have implicated the small GTPase ARL8 (com- 
pare with Fig. 1), a regulator of anterograde 
lysosome motility (32, 33), in presynaptic 
biogenesis. If the organelles identified ultra- 
structurally (Fig. 1, J to M) represent PVs, 
their axonal transport should require ARL8. 
We generated knockout (KO) iPSC lines 
lacking either one or both human isoforms of 
ARLS8 (termed ARL8A and ARLSB) (fig. S3A). 
Complete loss of ARL8A and ALR8B in three 
independent double KO lines strongly impaired 
anterograde axonal transport of Synaptophysin- 
positive PVs (Fig. 2, A and B; fig. S3, A and B; 
and movie S4), a phenotype restored by re- 
expression of ARL8A- or ARL8B-c-myc (Fig. 
2, Aand B, and fig. S3D). Single loss of ARL8B 
alone produced a mild phenotype (fig. S3C). 
We conducted an unbiased proteomic screen 
for ARL8 interacting factors. This analysis 
identified SV cargos and KIFIA (Fig. 2, C and D), 
which is a kinesin that is implicated in axonal 
transport of SV proteins (12-16, 34). KIFIA- 
eGFP underwent efficient cotransport with 
SV and active zone proteins and with ARL8B 
(Fig. 2, Eand F). CRISPR-Cas9-mediated KO 
(fig. S3E) or short hairpin RNA (shRNA)- 
mediated knockdown (fig. S3, F and L) of 
KIFIA severely compromised anterograde 
transport of PVs, and this defect was restored 
by reexpression of wild-type (WT) KIFIA but 
not by a disease-associated KIFIA rigor mu- 
tant (32, 35) (Fig. 2, G and H, and movie S5). 
A similar blockade in anterograde axonal trans- 
port of PVs was observed in primary mouse 
hippocampal neurons expressing the KIFIA 
rigor mutant (Fig. 2, I and J). Anterograde 
axonal PV transport proceeded nearly unper- 
turbed in the near absence of classical kinesin 
KIF5B or the kinesin adaptors DENN and 
MADD (Fig. S3, G to I and L) and was mod- 
erately reduced upon depletion of FYCO! (fig. 
S3, J and M). To probe the interplay between 
ARL8A and/or ARL8B and KIF1A, we moni- 
tored the localization and axonal transport 
dynamics of the C-terminal membrane-binding 
(36) tail domain of KIFIA (KIFIA™"). KIFIA™"- 
eGFP colocalized with Synaptophysin-mRFP- 
containing PVs in WT neurons but was large- 
ly diffusely localized in ARL8A and ARL8B 
(ARL8A/B) double-KO iNs. Reexpression of 
hyperactive ARLSB*GTP (Q75L, Gin” —Leu) but 
not inactive ARL8SBeGDP (T34N, Thr**—Asn) 
rescued defective KIFIA“"-eGFP localization 
to anterogradely moving PVs (fig. S3, N and 
O). Loss of KIF1A had no substantial impact on 
the recruitment of ARL8B to PVs expressing 
Synaptophysin-eGFP (fig. S3, P and Q). 
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If anterogradely transported PVs represented 
precursor organelles for presynapse assembly, 
stalling their axonal transport should impair 
synapse assembly. Indeed, ARL8A/B double- 
KO iNs suffered from a severe reduction in the 
number of Bassoon puncta per dendrite length, 
whereas the density of postsynaptic Homerl 
puncta was unaffected (Fig. 3, A and B). More- 
over, in comparison with synapses from WT 
neurons, ARL8A/B double-KO iNs displayed a 
profound overall reduction in the amounts of 
SV and active zone proteins (Fig. 3, C to F, and 
fig. S4). ARL8A and ARLB loss did not affect 
postsynaptic glutamate receptors (fig. S4, B, C, 
and I) or Homer! (Fig. 3, C, D, and H; and fig. 
S4). ARL8A and ARL8B loss also did not affect 
the delivery of presynaptic voltage-gated P/Q- 
type calcium channels (Fig. 3G and fig. S4D), 
which suggests that P/Q-type calcium chan- 
nels may be transported to synapses by a dis- 
tinct mechanism. Lentiviral reexpression of 
ARL8B rescued defective presynaptic deliv- 
ery of SV and active zone proteins in ARL8A/B 
double-KO iNs (Fig. 3, A to F, and fig. S4). Loss 
of KIFIA phenocopied ARL8A/B double-KO 
with respect to reduced density of presynaptic 
Bassoon puncta, whereas postsynaptic Homerl 
puncta density was unchanged (Fig. 31 and fig. 
S5A). Synapses identified by Homer l, PSD-95, 
or GLUNI further showed a severe reduction 
in the amounts of SV and active zone proteins 
(Fig. 3, J and K, and fig. S5, B, C, E, and G), 
whereas presynaptic calcium channels (Fig. 3L 
and fig. S5D), Homer! (fig. S5, B, E, and F), and 
postsynaptic N-methyl-p-aspartate (NMDA) re- 
ceptors (Fig. 3M and fig. S5C) were unaltered. 
Defective presynaptic delivery of SV and active 
zone proteins was restored by reexpression of 
KIFIA (Fig. 3, J to K, and fig. S5, B, C, E, and G). 
Similar defects in presynaptic biogenesis that 
were determined by VGLUT1 immnuostaining 
were observed in D. melanogaster larval neuro- 
muscular junction (NMJ) synapses from wncl04 
(the KIF1A ortholog) hypomorphic animals 
(Fig. 3N and fig. S5, H and I). Defective pre- 
synaptic biogenesis in KIJFIA KO and ARL8A/B 
double-KO iNs was paralleled by severely re- 
duced stimulus-evoked postsynaptic calcium 
responses measured by Xph20-GCaMP’7f (Fig. 3, 
O and P), likely as a consequence of impaired 
presynaptic glutamate release. Consistently, over- 
expression of the transport-defective KIF1A rigor 
mutant impaired SV exocytosis in primary 
mouse hippocampal neurons (fig. S5K). By con- 
trast, presynaptic calcium influx was unal- 
tered in KIFIA KO or ARL8A/B double-KO iNs 
(fig. S5J), in agreement with the lack of effect 
of ARL8A/B or KIFIA loss on the amounts of 
presynaptic voltage-gated calcium channels 
(Fig. 3, Gand L, and figs. S4D and S5D). Pre- 
synaptic biogenesis in human neurons is thus 
mediated by ARL8A and KIF1A-dependent 
axonal transport and delivery of PVs to nascent 


synapses. 
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Anterograde axonal transport of PVs is 
controlled by the rare signaling lipid P1(3,5)P2 
Conflicting data regarding the role of the BORC 
complex, a postulated upstream activator of 
ARL8 (32, 37), in the axonal transport of SV 
proteins have been reported (29, 30, 38, 39). 
We found that lentiviral knockdown of the 
BORC subunits Myrlysin or Diaskedin im- 
paired anterograde axonal transport of LAMP1- 
positive lysosomes (fig. S6, A, C, and D) but had 
little effect on anterograde motility of PVs ex- 
pressing endogenous Synaptophysin-eGFP (fig. 
S6, E and F). We generated BORCS5 RNA in- 
terference (RNAi) lines in D. melanogaster. 
In contrast to loss of ARL8 (30), depletion of 
BORCS5 had no major effect on the presyn- 
aptic staining intensity of the SV protein VGLUT 
or on NMJ area (fig. S6, I to K), although BORCS5 
loss impaired autophagy and lysosome-mediated 
turnover of p62 (fig. S6, L to N). Notably, de- 
pletion of the BORC-specific subunits Myrlysin 
or Diaskedin rescued defective anterograde 
transport of PVs in ARL8A/B double-KO iNs 
(Fig. 4A and fig. S6H). This suggests that 
loss of Myrlysin or Diaskedin causes the up- 
regulation of an endogenous mechanism that 
renders PV transport ARL8A and ARL8B inde- 
pendent. Recent data from fibroblasts show 
that loss of BORC boosts the amounts of the 
rare late endosomal and lysosomal signaling 
lipid phosphatidylinositol 3,5-bisphosphate 
[P1(3,5)P,] (40-42) through a poorly under- 
stood mechanism (37). We speculated that ele- 
vation of PI(3,5)P. in the absence of BORC 
might promote anterograde PV transport. De- 
pletion of Myrlysin in iPSCs increased PI(3,5)P, 
as monitored by ratiometric imaging with a 
fluorophore-conjugated engineered p85a0-cSH2 
domain from class I PI 3-kinase (ep85a-cSH2) 
used as a sensor (Fig. 4B and fig. S7A) (43). To 
directly probe the function of PI(3,5)P. in an- 
terograde axonal PV transport, we analyzed the 
effects of pharmacological inhibitors of PI(3,5)P. 
or PI(3)P synthesis. Inhibition of PI(3,5)P, syn- 
thesis by the PI(3)P 5-kinase PIKFYVE inhibitor 
Apilimod (Fig. 4C) or of its precursor PI(3)P (see 
fig. S7D) by VPS34-IN1, SAR405, or compound 
19 (Fig. 4C and fig. S7, B and C) potently reduced 
the fraction of anterogradely moving PVs. More- 
over, Apilimod caused a small reduction in trans- 
port velocity of the remaining motile PVs [mean + 
SEM (um/s): dimethyl sulfoxide (DMSO) con- 
trol (ctrl), 1.825 + 0.1339 versus Apilimod, 1.595 + 
0.1306]. By contrast, anterograde axonal PV trans- 
port proceeded unperturbed upon inhibition of 
PI(4)P synthesis by PI4KIITB-IN10 or blockade of 
type II PI 5-phosphate 4-kinase (fig. $7, B and 
C). Anterograde transport of Synaptophysin- 
eGFP was also inhibited upon genetic inhibition 
of PI(3,5)P5 synthesis in PIKFYVE KO iNs (Fig. 
4D and fig. S7, E and F), depletion of PIKFYVE 
by lentiviral shRNA (Fig. 4A; fig. S6, G and H; 
and movie S6), or hydrolysis of PI(3)P/P1(3,5)P.5 
on PVs by Rapalog-induced local recruitment 
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Fig. 4. Pl(3,5)P2 synthesis regulates presynaptic biogenesis. (A) Motility of 
SYP-mRFP vesicles in WT and ARL8A/B double-KO iNs transduced with control 
lentivirus (ShCTRL) or after lentiviral knockdown of Myrlysin, Diaskedin, or 
PIKFYVE. n = 3 experiments (250 vesicles each). (B) PI(3,5)P2 measured by 
quantitative ratiometric imaging in situ: WT, 0.05 + 0.004; shMyrlysin, 0.15 + 
0.01. t test was performed. (C) Depletion of PI(3)P or PI(3,5)P> inhibits 
anterograde transport of SYP°"“°-eGFP vesicles in iNs. Number of SYP°"¢°-eGFP 
vesicles (per min per 30-um length) moving anterogradely, retrogradely, or 
remaining stationary per minute in iNs treated (30 min) with DMSO, VPS34-IN1, 
or Apilimod. DMSO, 11.1 + 2.2; VPS34-INI1, 1.7 + 0.9; Apilimod, 3.6 + 1.1. 

n = 5 experiments (250 vesicles each). (D) Number of anterogradely moving 
SYP-eGFP vesicles in day 12 to 14 iNs (per min per 30-um length): WT, 8.5 + 0.9; 
PIKFYVE KO 1, 3.8 + 0.05: PIKFYVE KO 2, 2.5 + 0.4; PIKFYVE KO 3, 4.3 4 Q.2. 


1.7 + 0.6%; no PI, 5.5 + 2.0%; PI(3,5)P>, 60.9 + 4.0%; PI(4,5)P>, 12.9 + 5.5%; 
PI(3)P, 24.0 + 8.1%. n = 4 experiments. In (A) and (C) to (E), one-way 

ANOVA followed by Dunnett’s post hoc test was performed. (F) Colocalization of 
endogenous mCherry-PIKFYVE with VAMP2 or SYT1 (green) in iN axons. Scale 
bar, 10 um. Show at the bottom are line intensity scans from 10-um segments. 
(G) Representative confocal images of iN axons expressing SYP-mRFP and 
microinjected with the PI(3,5)P2-sensing fluorophore-labeled ep85a-cSH2 
domain. Scale bar, 10 um. Shown at the bottom are line intensity scans from 
10-um segments. (H) Confocal images of WT and Fab] mutant NMJs stained for 
VGLUT (green) and horseradish peroxidase (HRP) as axonal membrane marker 
(magenta). An overview is shown at the top. Scale bars, 5 um. Insets show 
magnified views. Scale bars, 2 um. (I) Quantification of representative data 
Shown in (H). WT, 100 + 11.7; Fabl-RNAi, 51.2 + 5.5. n = 10 NMJs each. t test was 


n = 3 experiments (250 vesicles each). (E) 2xPH-(KIFIA) associates with 
PI(3,5)P> liposomes. Bound fraction (percentage of total): no liposomes, 


of the lipid phosphatase MTMR2 (fig. S7, Gand 
H). PIKFYVE knockdown abrogated antero- 
grade axonal transport of PVs in Myrlysin- or 
Diaskedin-depleted ARL8A/B double-KO iNs 
(Fig. 4A). Next, we tested whether PIKFYVE 
acts locally on PVs to promote axonal trans- 
port. Endogenous PIKFYVE coprecipitated 
with Syp°"°°-eGFP from iN lysates (fig. S8A), 
and endogenous mCherry-PIKFYVE™” (fig. 
S8B) colocalized with PVs that were expressing 
VAMP/Synaptobrevin 2 (Pearson, 0.51 + 0.03) 
or Synaptotagmin 1 (Pearson, 0.44 + 0.02) in 
PIKFYVE knock-in iNs (Fig. 4F). Moreover, 
axonal PVs colocalized with the PI(3,5)P5- 
sensing engineered p85a-cSH2 domain in live 
iNs (Fig. 4G) and co-stained with recombinant 
SNXA, a selective PI(3,5)P.-binding protein (44), 
in fixed iNs (fig. S8C). Colocalization of PVs with 
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either of these probes was abrogated by inhibi- 
tion of PI(3,5)P. synthesis (Fig. 4G and fig. S8C). 

KIFIA harbors a phosphoinositide-binding 
pleckstrin homology (PH) domain (45). We 
found that a KIF1A mutant that lacks its PH 
domain fails to rescue defective axonal trans- 
port of PVs in KIFIA KO iNs (fig. S9, A and B, 
and movie S5). Glutathione S-transferase-fused 
recombinant PH domain of KIFIA directly 
bound to PI(3,5)P., and, to a lesser extent, to 
PI(3)P in vitro (Fig. 4E and fig. S9C). When 
expressed in human iNs, KIFIA™"-GFP (fig. S9, 
D and E) or its PI(3,5)P.-binding PH domain 
(fig. S9, H and I) was recruited to anterogradely 
moving axonal PVs. Inhibition of PI(3,5)P5 syn- 
thesis or genetic depletion of PIKFYVE caused 
the partial dissociation of KIFIA™'-GFP (fig. 
S9, D to G) or mCherry-PH-(KIFIA) (fig. S9, H 
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performed. For all graphs, ****p < 0.0001, ***p < 0.001, **p < 0.01, and ns 
is p > 0.05. Data are mean + SEM. 


and I) from motile PVs. A PH domain mutant 
of KIFIA that lacked the ability to bind PI(3,5)P. 
(fig. S9, J and K) displayed a greatly reduced 
ability to associate with axonally transported 
PVs (fig. S9, H and I). Rapalog-induced hydrol- 
ysis of PI(3)P or PI(3,5)P5 on PVs by MTMR2 
displaced full-length KIFLA-eGFP from PVs (fig. 
SOL). These data indicate that PI(3,5)P5, among 
possible other functions, might contribute to 
recruiting KIFIA to axonal PVs. 

Lastly, we analyzed the long-term consequences 
of reduced PI(3,5)P5 synthesis for presynaptic 
biogenesis. iNs depleted of PIKFYVE displayed 
reduced synaptic amounts of the SV marker 
Synaptophysin and the active zone protein 
Bassoon (fig. S10, A to D), whereas postsynaptic 
Homerl remained unchanged (fig. SIOE). Ge- 
netic depletion of the D. melanogaster PIKFYVE 
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ortholog Fab1 partially phenocopied loss of 
KIF1A/UNC104 (compare with Fig. 3N), show- 
ing a reduction in SV proteins such as VGLUT 
(Fig. 4, H and I) and a possibly compensatory 
increase in NMJ length (fig. S10, F to I). 


Discussion 


The biology behind the origin of transport 
organelles that are responsible for the assem- 
bly of the presynaptic compartment for neuro- 
transmission in the mammalian CNS and in 
humans is a long-standing matter of investiga- 
tion and remains to be fully elucidated. Our 
findings demonstrate that mammalian pre- 
synaptic biogenesis occurs by axonal transport 
of PVs that are marked by the endolysosomal 
signaling lipid PI(3,5)P. (fig. S11). These PVs 
carry not only endogenous SV cargos but also 
active zone proteins as well as Neurexin16 to 
nascent presynapses, consistent with data in 
D. melanogaster (30) and C. elegans (31, 46) and 
with earlier studies in mammalian CNS neurons 
(19, 22). PVs comprise notably heterogeneous 
vesicular and tubular carriers that resemble the 
organelles accumulated in rat proximal axons 
in response to anterograde transport blockade 
observed decades ago (17) yet are clearly distinct 
in morphology, shape, and size from mature 
SVs. PVs are devoid of markers of the secretory 
pathway and distinct from recycling endosomes 
and from mature degradative lysosomes. In- 
stead, PVs represent a neuron-specific organelle 
that may derive from a pathway that sorts lyso- 
somal membrane proteins (28, 47, 48). 

We further show that axonal transport of 
presynaptic PVs depends on the poorly under- 
stood rare signaling lipid PI(3,5)P>5. P(3,5)P>5 
has been found to affect multiple organelles 
within the endolysosomal system, including 
late endosomes and lysosomes, macropino- 
somes (40-42), and recycling endosomes (49). 
Moreover, reduced PIKFYVE function was 
found to impair synapse formation (50). The 
ARLS8-KIFIA-PI(3,5)P, pathway for presynaptic 
biogenesis may thus reflect the relationship 
of SVs and related secretory organelles to the 
endolysosomal system (57) as underscored by 
the presence of proteins that are implicated in 
lysosome function on SVs, including ARL8A and 
ARL8B, LAMP5, LAMP1, LAMP2, the neuronal 
AP-3 complex, ATG9A, RAB26, and VPS34 (52). 

Mutations in KIFIA are linked to epilepsy, 
hereditary spastic paraplegia [a rare gait dis- 
order caused by axonal defects or axonal de- 
generation in the spinal cord (53-55)], and the 


Rizalar et al., Science 382, 223-230 (2023) 


peripheral neuropathy Charcot-Marie-Tooth 
disease (56). We thus speculate that defects in 
PI(3,5)P5-regulated axonal PV transport may be 
linked to some of these rare inherited neuro- 
logical disorders and peripheral neuropathies. 
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My many passions 


hy are you here? You seem a little bit out of place.” I was attending a conference about 
contemporary art, and right before my talk a distinguished scholar approached me. As a 
small conversation group formed around us, I introduced myself as a psychology Ph.D. 
student who would be presenting research on how the brain reacts to art. Silence de- 
scended, and all eyes were on me. It was true that I was an unusual attendee at what was 
otherwise a conference full of art scholars. But his comment hit me like a tidal wave. Sud- 


denly, the room felt intimidating. Only later did I learn how to stand up for myself and be proud of 


my status as an interdisciplinary researcher. 


Years earlier, when I was a master’s 
student in cognitive neuroscience, 
my professors advised me, “Marta, 
narrow your focus. Choose one in- 
terest and pursue a single goal.” I 
know they were giving me valuable 
advice. But the reality was that my 
mind was a whirlwind of ideas, and 
I found myself captivated by many 
subjects—some far outside neuro- 
science. Art had always been an in- 
tegral part of my life: I grew up ina 
family of artists and art teachers. I 
was also fascinated by new techno- 
logy and how it allowed researchers 
to explore uncharted waters. 

So, I disregarded my professors’ 
advice and looked for ways to merge 
my disparate passions. I scoured job 
opportunities, searching for roles 
that would allow me to integrate art, 
neuroscience, and technology. For a 
while it seemed an impossible mission. Then I stumbled on 
an opportunity to work in a psychology lab that welcomed 
not only scientists, but also artists and technologists. Inter- 
disciplinarity was not merely encouraged; it was celebrated 
as the essence of groundbreaking research. I devised a 
Ph.D. project that involved imaging the brain and asking 
people how they felt when they were exposed to art via im- 
mersive technologies. 

When I entered the Ph.D. program and started the proj- 
ect, I felt I had finally found my place. I dove into the study 
of art history, as well as technologies that allowed people to 
have new aesthetic experiences. I had colleagues from other 
fields—artists, engineers, and computer scientists—who 
were open to helping me and providing valuable advice. 

However, the research journey didn’t unfold as smoothly 
as I had envisioned. Each scientific conference I attended, 
regardless of its focus, left me feeling like an imposter, as 
academics tried to pigeonhole me into a known professional 


“I didn't profess to be an expert 
in any field. Instead, | saw myself 
as a bridge between domains.” 


category. Looking around, I also no- 
ticed that whereas other students’ 
research projects were comfortably 
nestled within their respective do- 
mains, I seemed to be setting myself 
up for being a jack of all trades but a 
master of none. 

I wondered whether I should give 
up, convinced I had veered off course 
irreparably. But then, after a long day 
at work, I attended the opening of 
an art exhibit at a local museum. It 
was dedicated to history’s most fer- 
vent advocate of interdisciplinarity: 
Leonardo da Vinci, an artist, sci- 
entist, and inventor. I hung on the 
words of the museum guide: “Had 
he not been intrigued by myriad sub- 
jects, his achievements would have 
remained elusive.” 

These simple words, coming at a 
profoundly challenging juncture in 
my life, reshaped my perspective and renewed my determi- 
nation. From then on, when I spoke with scholars, I made it a 
point to articulate my unique perspective, emphasizing that 
I didn’t profess to be an expert in any field. Instead, I saw 
myself as a bridge between domains, aiming to synthesize 
knowledge and foster interdisciplinary connections. I wel- 
comed and learned from their criticisms. And I reached out 
to other interdisciplinary researchers to discuss the hurdles 
they encountered and how they overcame them. 

In the end, my academic journey has taught me that you 
may not have to choose among your passions—you may be 
able to pursue them all. For me, that’s turned into a reward- 
ing opportunity. ’ve found that the true value of interdis- 
ciplinary research lies in the fresh and unique perspectives 
you can create by melding various disciplines’ viewpoints. 


Marta Pizzolante is a Ph.D. student at the Catholic University of Milan. 
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Precise In Vivo Luminescence Imaging 
AMSBIO has announced a new range 

of pre-made lentivirus products that 
produce highly sensitive and precise in 
vivo luminescence imaging, thanks to 

a proprietary lentiviral vector system. 
The new lentivirus products are based 
upon a novel bioluminescence imag- 
ing method called “Nano-Lantern,” which uses a bioluminescence 
resonance energy transfer approach to significantly enhance 
signal intensity. Nano-Lantern bioluminescence imaging also 

has an extremely low-level background signal, making it more 
sensitive and quantitative. When linked to a targeting protein, 
Nano-Lantern lentivirus products emit light in response to specific 
biological activity enabling dynamic real-time visualization in living 
organisms. Combining dynamic visualization with enhanced signal 
intensity and low background noise, Nano-Lantern lentivirus has 
been demonstrated to provide bright and high-resolution imaging 
of small, rapidly moving sub cellular structures. Fluorescent mark- 
ers and bioluminescence are widely used techniques for in vivo 
imaging in living cells. While fluorescence imaging is an extremely 
useful tool, it requires external excitation from laser light, which 
has negative tradeoffs such as autofluorescence, phototoxicity, 
and photobleaching. Bioluminescence imaging does not require 
light activation, but its low brightness emissions typically require 
prolonged exposure times. That gives it its own tradeoff, such 

as greater difficulty observing small, rapidly moving structures. 
According to AMSBIO, this technology is a significant improvement 
compared to previous methods. 

AMSBIO 

For info: +1-617-945-5033 
www.amsbio.com/nano-lantern-luminescence-imaging/ 


CRISPR Mosaicism-Screening Platform 

Despite the proven efficiency and speed of using CRISPR/Cas9 for 
the generation of custom genetically engineered mouse models, 
concerns remain around the unpredictability of founder mosai- 

cism and the potential for off-target gene editing, adding risk to the 
process of developing a quality model that could ultimately prolong 
the time to experimental data. Taconic Biosciences’ ExpressMODEL® 
CRISPR platform was designed to alleviate that problem by employ- 
ing next generation sequencing to assess mosaicism and screen for 
potential off-targets in gene-edited founder mice. This sequencing 
informs the selection of a suitable founder for germline transmis- 
sion and enables in vitro fertilization and expansion of the genetic 
line one generation earlier than traditional approaches. By providing 
a larger cohort of off-target screened, heterozygous mice that can 
be immediately used to produce homozygous study animals, the 
ExpressMODEL® CRISPR service can expedite study results by at 
least 3 months. 

Taconic Biosciences 

For info: +1 888-TACONIC (888-822-6642) 

www.taconic.com 


Dual Detector for GPC/SEC 

TESTA Analytical has released the COMBO-ONE Viscometer/DRI 
detector for demanding GPC/SEC applications, such as investigating 
polymer structure and branching. Viscometry detectors for GPC/SEC 
have traditionally used four matched capillary tubes in a Wheatstone 
bridge configuration, but suffered from limited efficacy as a result. 
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The COMBO-ONE was intended to bypass those limitations. ce 


“By eliminating the fundamental drawback of the traditional online 
viscometer design, the known ‘sample breakthrough’ after each 
measurement, our dual detector allows much more rapid sequenc- 
ing of samples,” says Carlo Dessy, the COMBO-ONE’s developer. 
“The result of our design innovation is to increase the total daily 
throughput of a GPC/SEC System by a factor of two.” This leads to 
“a significant reduction of solvent required for each measurement 
and consequently the amount of waste produced.” The COMBO-ONE 
Viscometer/DRI detector from TESTA Analytical is fully compatible 
with all modern GPC/SEC systems. 

Testa Analytical Solutions 

For info: +49-30-864-24076 
www.testa-analytical.com/gpc-sec-chromatography.html 


Cancer Screening Platform 

ScreenIn3D has developed a cancer screening platform that com- 
bines the latest advances in microfluidics and 3D cell culture to make 
tumor biopsies easier. With new chip technology, researchers can 
use as few as 1000 cells for tens of 3D cancer screening experiments. 
Following recent investments from Gabriel Investments Ltd, Scottish 
Enterprise, and the University of Strathclyde’s Entrepreneurial Fund, 
the company has announced plans to expand the capabilities of the 
platform for testing new tumor indications, scale up chip manufac- 
ture, and make the platform available to pharma and biotech com- 
panies who want to license the technology for in-house use. “There 
are challenges with evaluating solid tumors, not just cancerous cells 
but those surrounding them and their interactions with immuno- C 
cells,” said ScreenIn3D CEO Dr. Michele Zagnoni in a release. “Tumor 
tissue is a precious resource, which is underutilized in drug develop- 
ment due to high cost and its limited quantity.” 

ScreenIn3D 

For info: +44-141-444-7368 

www.screenin3d.com 


MIRO CANVAS 

INTEGRA Biosciences has announced the launch of MIRO CANVAS, a 
compact digital microfluidics platform for fully automated next gen- 
eration sequencing (NGS) sample preparation. The system simplifies 
NGS workflows for more walk-away time and higher lab productiv- 
ity, accelerating genomics discoveries. NGS has made it possible 

to sequence entire genomes with less cost and time compared to 
earlier sequencing methods. However, sample and library prepara-  , 
tion for NGS workflows still remain a challenge, as they are notori- C 
ously complex, time-consuming, and error-prone when performed 
manually. MIRO CANVAS was developed to meet the growing need 
for fully automated NGS sample preparation with verified protocols 
for short- and long-read sequencing applications to ensure accuracy, 
precision, and reliability for high-quality results. The microfluidics 
system uses gentle sample handling to maintain the integrity of high 
molecular weight DNA and minimizes reagent usage for long-read 
sequencing library preparation. MIRO CANVAS can also automate 
exome and other hybrid capture protocols, one of the most labori- 
ous protocols in NGS sample preparation. According to INTEGRA, 
the platform requires only 15 minutes of hands-on time per run, 
freeing up researchers to work on other vital tasks in the lab. It also 
offers fast, on-demand preparation of samples, so clinical specimens 
can be processed straight away upon receipt—without the need for 
batching—streamlining genomics workflows. 

INTEGRA 

For info: +44 (0) 1480 405333 

www.integra-biosciences.com 
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